Score: 2

Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records

Published: August 8, 2025 | arXiv ID: 2508.06627v3

By: Mosbah Aouad , Anirudh Choudhary , Awais Farooq and more

Potential Business Impact:

Find cancer a year before doctors can.

Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers, and early detection remains a major clinical challenge due to the absence of specific symptoms and reliable biomarkers. In this work, we propose a new multimodal approach that integrates longitudinal diagnosis code histories and routinely collected laboratory measurements from electronic health records to detect PDAC up to one year prior to clinical diagnosis. Our method combines neural controlled differential equations to model irregular lab time series, pretrained language models and recurrent networks to learn diagnosis code trajectory representations, and cross-attention mechanisms to capture interactions between the two modalities. We develop and evaluate our approach on a real-world dataset of nearly 4,700 patients and achieve significant improvements in AUC ranging from 6.5% to 15.5% over state-of-the-art methods. Furthermore, our model identifies diagnosis codes and laboratory panels associated with elevated PDAC risk, including both established and new biomarkers. Our code is available at https://github.com/MosbahAouad/EarlyPDAC-MML.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
22 pages

Category
Computer Science:
Machine Learning (CS)