Stable Prediction of Adverse Events in Medical Time-Series Data
By: Mayank Keoliya , Seewon Choi , Rajeev Alur and more
Potential Business Impact:
Helps doctors predict patient health changes sooner.
Early event prediction (EEP) systems continuously estimate a patient's imminent risk to support clinical decision-making. For bedside trust, risk trajectories must be accurate and temporally stable, shifting only with new, relevant evidence. However, current benchmarks (a) ignore stability of risk scores and (b) evaluate mainly on tabular inputs, leaving trajectory behavior untested. To address this gap, we introduce CAREBench, an EEP benchmark that evaluates deployability using multi-modal inputs-tabular EHR, ECG waveforms, and clinical text-and assesses temporal stability alongside predictive accuracy. We propose a stability metric that quantifies short-term variability in per-patient risk and penalizes abrupt oscillations based on local-Lipschitz constants. CAREBench spans six prediction tasks such as sepsis onset and compares classical learners, deep sequence models, and zero-shot LLMs. Across tasks, existing methods, especially LLMs, struggle to jointly optimize accuracy and stability, with notably poor recall at high-precision operating points. These results highlight the need for models that produce evidence-aligned, stable trajectories to earn clinician trust in continuous monitoring settings. (Code: https://github.com/SeewonChoi/CAREBench.)
Similar Papers
Cross-Representation Benchmarking in Time-Series Electronic Health Records for Clinical Outcome Prediction
Machine Learning (CS)
Helps doctors predict patient health better.
Machine Learning Approaches to Clinical Risk Prediction: Multi-Scale Temporal Alignment in Electronic Health Records
Machine Learning (CS)
Predicts health risks from messy patient records.
Early Warning Index for Patient Deteriorations in Hospitals
Machine Learning (CS)
Helps doctors find sick patients faster.