Comparing methods for handling missing data in electronic health records for dynamic risk prediction of central-line associated bloodstream infection
By: Shan Gao , Elena Albu , Pieter Stijnen and more
Potential Business Impact:
Finds sickness risks even with missing patient info.
Electronic health records (EHR) often contain varying levels of missing data. This study compared different imputation strategies to identify the most suitable approach for predicting central line-associated bloodstream infection (CLABSI) in the presence of competing risks using EHR data. We analyzed 30862 catheter episodes at University Hospitals Leuven (2012-2013) to predict 7-day CLABSI risk using a landmark cause-specific supermodel, accounting for competing risks of hospital discharge and death. Imputation methods included simple methods (median/mode, last observation carried forward), multiple imputation, regression-based and mixed-effects models leveraging longitudinal structure, and random forest imputation to capture interactions and non-linearities. Missing indicators were also assessed alone and in combination with other imputation methods. Model performance was evaluated dynamically at daily landmarks up to 14 days post-catheter placement. The missing indicator approach showed the highest discriminative ability, achieving a mean AUROC of up to 0.782 and superior overall performance based on the scaled Brier score. Combining missing indicators with other methods slightly improved performance, with the mixed model approach combined with missing indicators achieving the highest AUROC (0.783) at day 4, and the missForestPredict approach combined with missing indicators yielding the best scaled Brier scores at earlier landmarks. This suggests that in EHR data, the presence or absence of information may hold valuable insights for patient risk prediction. However, the use of missing indicators requires caution, as shifts in EHR data over time can alter missing data patterns, potentially impacting model transportability.
Similar Papers
Compatibility of Missing Data Handling Methods across the Stages of Producing Clinical Prediction Models
Methodology
Helps doctors predict sickness even with missing info.
Integrated Analysis for Electronic Health Records with Structured and Sporadic Missingness
Applications
Fixes missing info in patient records for better health studies.
Robust Causal Inference for EHR-based Studies of Point Exposures with Missingness in Eligibility Criteria
Methodology
Finds more patients for medical studies.