Score: 1

Building Patient Journeys in Hebrew: A Language Model for Clinical Timeline Extraction

Published: December 12, 2025 | arXiv ID: 2512.11502v1

By: Kai Golan Hashiloni , Brenda Kasabe Nokai , Michal Shevach and more

Potential Business Impact:

Helps doctors understand patient health history faster.

Business Areas:
Electronic Health Record (EHR) Health Care

We present a new Hebrew medical language model designed to extract structured clinical timelines from electronic health records, enabling the construction of patient journeys. Our model is based on DictaBERT 2.0 and continually pre-trained on over five million de-identified hospital records. To evaluate its effectiveness, we introduce two new datasets -- one from internal medicine and emergency departments, and another from oncology -- annotated for event temporal relations. Our results show that our model achieves strong performance on both datasets. We also find that vocabulary adaptation improves token efficiency and that de-identification does not compromise downstream performance, supporting privacy-conscious model development. The model is made available for research use under ethical restrictions.

Country of Origin
🇮🇱 Israel

Repos / Data Links

Page Count
10 pages

Category
Computer Science:
Computation and Language