Score: 0

DANIEL: A Distributed and Scalable Approach for Global Representation Learning with EHR Applications

Published: November 4, 2025 | arXiv ID: 2511.02754v1

By: Zebin Wang , Ziming Gan , Weijing Tang and more

Potential Business Impact:

Lets hospitals share patient data safely.

Business Areas:

Electronic Health Record (EHR) Health Care

Classical probabilistic graphical models face fundamental challenges in modern data environments, which are characterized by high dimensionality, source heterogeneity, and stringent data-sharing constraints. In this work, we revisit the Ising model, a well-established member of the Markov Random Field (MRF) family, and develop a distributed framework that enables scalable and privacy-preserving representation learning from large-scale binary data with inherent low-rank structure. Our approach optimizes a non-convex surrogate loss function via bi-factored gradient descent, offering substantial computational and communication advantages over conventional convex approaches. We evaluate our algorithm on multi-institutional electronic health record (EHR) datasets from 58,248 patients across the University of Pittsburgh Medical Center (UPMC) and Mass General Brigham (MGB), demonstrating superior performance in global representation learning and downstream clinical tasks, including relationship detection, patient phenotyping, and patient clustering. These results highlight a broader potential for statistical inference in federated, high-dimensional settings while addressing the practical challenges of data complexity and multi-institutional integration.

Generative Foundation Model for Structured and Unstructured Electronic Health Records

Artificial Intelligence

Helps doctors predict sickness and write notes faster.

22 Aug 2025 0

86%

Cross-Representation Benchmarking in Time-Series Electronic Health Records for Clinical Outcome Prediction

Machine Learning (CS)

Helps doctors predict patient health better.

10 Oct 2025 0

86%

Automated Hierarchical Graph Construction for Multi-source Electronic Health Records

Machine Learning (Stat)

Connects patient records from different hospitals.

8 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

63 pages

DANIEL: A Distributed and Scalable Approach for Global Representation Learning with EHR Applications

Lets hospitals share patient data safely.

Technical Abstract

Generative Foundation Model for Structured and Unstructured Electronic Health Records

Cross-Representation Benchmarking in Time-Series Electronic Health Records for Clinical Outcome Prediction

Automated Hierarchical Graph Construction for Multi-source Electronic Health Records