DR.EHR: Dense Retrieval for Electronic Health Record with Knowledge Injection and Synthetic Data
By: Zhengyun Zhao , Huaiyuan Ying , Yue Zhong and more
Potential Business Impact:
Helps doctors find patient info faster.
Electronic Health Records (EHRs) are pivotal in clinical practices, yet their retrieval remains a challenge mainly due to semantic gap issues. Recent advancements in dense retrieval offer promising solutions but existing models, both general-domain and biomedical-domain, fall short due to insufficient medical knowledge or mismatched training corpora. This paper introduces \texttt{DR.EHR}, a series of dense retrieval models specifically tailored for EHR retrieval. We propose a two-stage training pipeline utilizing MIMIC-IV discharge summaries to address the need for extensive medical knowledge and large-scale training data. The first stage involves medical entity extraction and knowledge injection from a biomedical knowledge graph, while the second stage employs large language models to generate diverse training data. We train two variants of \texttt{DR.EHR}, with 110M and 7B parameters, respectively. Evaluated on the CliniQ benchmark, our models significantly outperforms all existing dense retrievers, achieving state-of-the-art results. Detailed analyses confirm our models' superiority across various match and query types, particularly in challenging semantic matches like implication and abbreviation. Ablation studies validate the effectiveness of each pipeline component, and supplementary experiments on EHR QA datasets demonstrate the models' generalizability on natural language questions, including complex ones with multiple entities. This work significantly advances EHR retrieval, offering a robust solution for clinical applications.
Similar Papers
Generating Querying Code from Text for Multi-Modal Electronic Health Record
Information Retrieval
Lets doctors find patient info easily.
Beyond Long Context: When Semantics Matter More than Tokens
Computation and Language
Helps doctors find patient info faster.
EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis
Computation and Language
Helps doctors understand patient health records better.