Score: 0

CEHR-XGPT: A Scalable Multi-Task Foundation Model for Electronic Health Records

Published: September 3, 2025 | arXiv ID: 2509.03643v2

By: Chao Pang , Jiheum Park , Xinzhuo Jiang and more

Potential Business Impact:

Helps doctors predict patient health using past records.

Business Areas:
Electronic Health Record (EHR) Health Care

Electronic Health Records (EHRs) provide a rich, longitudinal view of patient health and hold significant potential for advancing clinical decision support, risk prediction, and data-driven healthcare research. However, most artificial intelligence (AI) models for EHRs are designed for narrow, single-purpose tasks, limiting their generalizability and utility in real-world settings. Here, we present CEHR-XGPT, a general-purpose foundation model for EHR data that unifies three essential capabilities - feature representation, zero-shot prediction, and synthetic data generation - within a single architecture. To support temporal reasoning over clinical sequences, CEHR-XGPT incorporates a novel time-token-based learning framework that explicitly encodes patients' dynamic timelines into the model structure. CEHR-XGPT demonstrates strong performance across all three tasks and generalizes effectively to external datasets through vocabulary expansion and fine-tuning. Its versatility enables rapid model development, cohort discovery, and patient outcome forecasting without the need for task-specific retraining.

Page Count
23 pages

Category
Computer Science:
Machine Learning (CS)