Score: 2

From Fuzzy Speech to Medical Insight: Benchmarking LLMs on Noisy Patient Narratives

Published: September 15, 2025 | arXiv ID: 2509.11803v1

By: Eden Mama , Liel Sheri , Yehudit Aperstein and more

Potential Business Impact:

Helps doctors understand patient stories better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The widespread adoption of large language models (LLMs) in healthcare raises critical questions about their ability to interpret patient-generated narratives, which are often informal, ambiguous, and noisy. Existing benchmarks typically rely on clean, structured clinical text, offering limited insight into model performance under realistic conditions. In this work, we present a novel synthetic dataset designed to simulate patient self-descriptions characterized by varying levels of linguistic noise, fuzzy language, and layperson terminology. Our dataset comprises clinically consistent scenarios annotated with ground-truth diagnoses, spanning a spectrum of communication clarity to reflect diverse real-world reporting styles. Using this benchmark, we fine-tune and evaluate several state-of-the-art models (LLMs), including BERT-based and encoder-decoder T5 models. To support reproducibility and future research, we release the Noisy Diagnostic Benchmark (NDB), a structured dataset of noisy, synthetic patient descriptions designed to stress-test and compare the diagnostic capabilities of large language models (LLMs) under realistic linguistic conditions. We made the benchmark available for the community: https://github.com/lielsheri/PatientSignal

Towards Robust and Fair Next Visit Diagnosis Prediction under Noisy Clinical Notes with Large Language Models

Computation and Language

Makes AI doctors more trustworthy with messy notes.

23 Nov 2025 2

89%

Building a Silver-Standard Dataset from NICE Guidelines for Clinical LLMs

Computation and Language

Helps doctors use AI to follow health rules.

2 Nov 2025 2

88%

Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks

Computation and Language

Helps computers understand Arabic medical questions.

13 Aug 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

6 pages

From Fuzzy Speech to Medical Insight: Benchmarking LLMs on Noisy Patient Narratives

Helps doctors understand patient stories better.

Technical Abstract

Towards Robust and Fair Next Visit Diagnosis Prediction under Noisy Clinical Notes with Large Language Models

Building a Silver-Standard Dataset from NICE Guidelines for Clinical LLMs

Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks