Towards Robust and Fair Next Visit Diagnosis Prediction under Noisy Clinical Notes with Large Language Models
By: Heejoon Koo
Potential Business Impact:
Makes AI doctors more trustworthy with messy notes.
A decade of rapid advances in artificial intelligence (AI) has opened new opportunities for clinical decision support systems (CDSS), with large language models (LLMs) demonstrating strong reasoning abilities on timely medical tasks. However, clinical texts are often degraded by human errors or failures in automated pipelines, raising concerns about the reliability and fairness of AI-assisted decision-making. Yet the impact of such degradations remains under-investigated, particularly regarding how noise-induced shifts can heighten predictive uncertainty and unevenly affect demographic subgroups. We present a systematic study of state-of-the-art LLMs under diverse text corruption scenarios, focusing on robustness and equity in next-visit diagnosis prediction. To address the challenge posed by the large diagnostic label space, we introduce a clinically grounded label-reduction scheme and a hierarchical chain-of-thought (CoT) strategy that emulates clinicians' reasoning. Our approach improves robustness and reduces subgroup instability under degraded inputs, advancing the reliable use of LLMs in CDSS. We release code at https://github.com/heejkoo9/NECHOv3.
Similar Papers
Leveraging Evidence-Guided LLMs to Enhance Trustworthy Depression Diagnosis
Artificial Intelligence
Helps AI doctors diagnose illnesses more accurately.
From Fuzzy Speech to Medical Insight: Benchmarking LLMs on Noisy Patient Narratives
Computation and Language
Helps doctors understand patient stories better.
Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions
Computation and Language
AI unfairly predicts drug side effects for some.