Integrated Analysis for Electronic Health Records with Structured and Sporadic Missingness
By: Jianbin Tan , Yan Zhang , Chuan Hong and more
Potential Business Impact:
Fixes missing info in patient records for better health studies.
Objectives: We propose a novel imputation method tailored for Electronic Health Records (EHRs) with structured and sporadic missingness. Such missingness frequently arises in the integration of heterogeneous EHR datasets for downstream clinical applications. By addressing these gaps, our method provides a practical solution for integrated analysis, enhancing data utility and advancing the understanding of population health. Materials and Methods: We begin by demonstrating structured and sporadic missing mechanisms in the integrated analysis of EHR data. Following this, we introduce a novel imputation framework, Macomss, specifically designed to handle structurally and heterogeneously occurring missing data. We establish theoretical guarantees for Macomss, ensuring its robustness in preserving the integrity and reliability of integrated analyses. To assess its empirical performance, we conduct extensive simulation studies that replicate the complex missingness patterns observed in real-world EHR systems, complemented by validation using EHR datasets from the Duke University Health System (DUHS). Results: Simulation studies show that our approach consistently outperforms existing imputation methods. Using datasets from three hospitals within DUHS, Macomss achieves the lowest imputation errors for missing data in most cases and provides superior or comparable downstream prediction performance compared to benchmark methods. Conclusions: We provide a theoretically guaranteed and practically meaningful method for imputing structured and sporadic missing data, enabling accurate and reliable integrated analysis across multiple EHR datasets. The proposed approach holds significant potential for advancing research in population health.
Similar Papers
Comparing methods for handling missing data in electronic health records for dynamic risk prediction of central-line associated bloodstream infection
Applications
Finds sickness risks even with missing patient info.
Sensitivity analysis for nonignorable missing values in blended analysis framework: a study on the effect of bariatric surgery via electronic health records
Methodology
Fixes doctor records with missing info.
Missing data in non-stationary multivariate time series from digital studies in Psychiatry
Methodology
Fixes missing health data from phones.