Agent-Based Feature Generation from Clinical Notes for Outcome Prediction
By: Jiayi Wang , Jacqueline Jil Vallon , Neil Panjwani and more
Potential Business Impact:
Helps doctors predict cancer from notes.
Electronic health records (EHRs) contain rich unstructured clinical notes that could enhance predictive modeling, yet extracting meaningful features from these notes remains challenging. Current approaches range from labor-intensive manual clinician feature generation (CFG) to fully automated representational feature generation (RFG) that lack interpretability and clinical relevance. Here we introduce SNOW (Scalable Note-to-Outcome Workflow), a modular multi-agent system powered by large language models (LLMs) that autonomously generates structured clinical features from unstructured notes without human intervention. We evaluated SNOW against manual CFG, clinician-guided LLM approaches, and RFG methods for predicting 5-year prostate cancer recurrence in 147 patients from Stanford Healthcare. While manual CFG achieved the highest performance (AUC-ROC: 0.771), SNOW matched this performance (0.761) without requiring any clinical expertise, significantly outperforming both baseline features alone (0.691) and all RFG approaches. The clinician-guided LLM method also performed well (0.732) but still required expert input. SNOW's specialized agents handle feature discovery, extraction, validation, post-processing, and aggregation, creating interpretable features that capture complex clinical information typically accessible only through manual review. Our findings demonstrate that autonomous LLM systems can replicate expert-level feature engineering at scale, potentially transforming how clinical ML models leverage unstructured EHR data while maintaining the interpretability essential for clinical deployment.
Similar Papers
ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes
Artificial Intelligence
Helps doctors predict heart attack return using notes.
Knowledge-Informed Automatic Feature Extraction via Collaborative Large Language Model Agents
Artificial Intelligence
Finds hidden patterns in data for discoveries.
Generative Foundation Model for Structured and Unstructured Electronic Health Records
Artificial Intelligence
Helps doctors predict sickness and write notes faster.