Score: 0

Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science

Published: January 28, 2026 | arXiv ID: 2601.20674v1

By: Juan Jose Rubio Jan, Jack Wu, Julia Ive

Potential Business Impact:

Lets computers find health info in records.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This study applies Large Language Models (LLMs) to two foundational Electronic Health Record (EHR) data science tasks: structured data querying (using programmatic languages, Python/Pandas) and information extraction from unstructured clinical text via a Retrieval Augmented Generation (RAG) pipeline. We test the ability of LLMs to interact accurately with large structured datasets for analytics and the reliability of LLMs in extracting semantically correct information from free text health records when supported by RAG. To this end, we presented a flexible evaluation framework that automatically generates synthetic question and answer pairs tailored to the characteristics of each dataset or task. Experiments were conducted on a curated subset of MIMIC III, (four structured tables and one clinical note type), using a mix of locally hosted and API-based LLMs. Evaluation combined exact-match metrics, semantic similarity, and human judgment. Our findings demonstrate the potential of LLMs to support precise querying and accurate information extraction in clinical workflows.

Large Language Models with Temporal Reasoning for Longitudinal Clinical Summarization and Prediction

Computation and Language

Helps doctors quickly understand patient history.

30 Jan 2025 1

92%

Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case

Artificial Intelligence

Helps doctors find patient info from notes.

4 Dec 2025 0

92%

Large Language Models are Powerful Electronic Health Record Encoders

Machine Learning (CS)

Helps doctors predict health problems using plain text.

24 Feb 2025 1

View PDF Login to Bookmark

Page Count

11 pages

Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science

Lets computers find health info in records.

Technical Abstract

Large Language Models with Temporal Reasoning for Longitudinal Clinical Summarization and Prediction

Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case

Large Language Models are Powerful Electronic Health Record Encoders