Building Patient Journeys in Hebrew: A Language Model for Clinical Timeline Extraction
By: Kai Golan Hashiloni , Brenda Kasabe Nokai , Michal Shevach and more
Potential Business Impact:
Helps doctors understand patient health history faster.
We present a new Hebrew medical language model designed to extract structured clinical timelines from electronic health records, enabling the construction of patient journeys. Our model is based on DictaBERT 2.0 and continually pre-trained on over five million de-identified hospital records. To evaluate its effectiveness, we introduce two new datasets -- one from internal medicine and emergency departments, and another from oncology -- annotated for event temporal relations. Our results show that our model achieves strong performance on both datasets. We also find that vocabulary adaptation improves token efficiency and that de-identification does not compromise downstream performance, supporting privacy-conscious model development. The model is made available for research use under ethical restrictions.
Similar Papers
Language Models for Longitudinal Clinical Prediction
Computation and Language
Helps doctors predict diseases early from patient notes.
Temporal Entailment Pretraining for Clinical Language Models over EHR Data
Computation and Language
Helps doctors predict patient health changes over time.
Arabic Large Language Models for Medical Text Generation
Computation and Language
Helps doctors give better advice in Arabic.