Score: 1

Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models

Published: May 12, 2025 | arXiv ID: 2505.07968v3

By: Weiyi Wu , Xinwen Xu , Chongyang Gao and more

Potential Business Impact:

Makes AI doctors give up-to-date advice.

Business Areas:

Clinical Trials Health Care

Large Language Models (LLMs) have great potential in the field of health care, yet they face great challenges in adapting to rapidly evolving medical knowledge. This can lead to outdated or contradictory treatment suggestions. This study investigated how LLMs respond to evolving clinical guidelines, focusing on concept drift and internal inconsistencies. We developed the DriftMedQA benchmark to simulate guideline evolution and assessed the temporal reliability of various LLMs. Our evaluation of seven state-of-the-art models across 4,290 scenarios demonstrated difficulties in rejecting outdated recommendations and frequently endorsing conflicting guidance. Additionally, we explored two mitigation strategies: Retrieval-Augmented Generation and preference fine-tuning via Direct Preference Optimization. While each method improved model performance, their combination led to the most consistent and reliable results. These findings underscore the need to improve LLM robustness to temporal shifts to ensure more dependable applications in clinical practice. The dataset is available at https://huggingface.co/datasets/RDBH/DriftMed.

Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models

Computation and Language

AI doctors remember new medical facts.

4 Sep 2025 1

90%

Medical large language models are easily distracted

Computation and Language

Helps doctors' AI understand patient talk better.

1 Apr 2025 0

90%

Dr. GPT Will See You Now, but Should It? Exploring the Benefits and Harms of Large Language Models in Medical Diagnosis using Crowdsourced Clinical Cases

Computers and Society

AI helps answer everyday health questions accurately.

13 Jun 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

24 pages

Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models

Makes AI doctors give up-to-date advice.

Technical Abstract

Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models

Medical large language models are easily distracted

Dr. GPT Will See You Now, but Should It? Exploring the Benefits and Harms of Large Language Models in Medical Diagnosis using Crowdsourced Clinical Cases