Score: 0

Brittleness and Promise: Knowledge Graph Based Reward Modeling for Diagnostic Reasoning

Published: September 22, 2025 | arXiv ID: 2509.18316v1

By: Saksham Khatwani , He Cheng , Majid Afshar and more

Potential Business Impact:

Helps doctors find sicknesses by checking reasoning.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) show promise for diagnostic reasoning but often lack reliable, knowledge grounded inference. Knowledge graphs (KGs), such as the Unified Medical Language System (UMLS), offer structured biomedical knowledge that can support trustworthy reasoning. Prior approaches typically integrate KGs via retrieval augmented generation or fine tuning, inserting KG content into prompts rather than enabling structured reasoning. We explore an alternative paradigm: treating the LLM as a reward model of KG reasoning paths, where the model learns to judge whether a candidate path leads to correct diagnosis for a given patient input. This approach is inspired by recent work that leverages reward training to enhance model reasoning abilities, and grounded in computational theory, which suggests that verifying a solution is often easier than generating one from scratch. It also parallels physicians' diagnostic assessment, where they judge which sequences of findings and intermediate conditions most plausibly support a diagnosis. We first systematically evaluate five task formulation for knowledge path judging and eight training paradigm. Second, we test whether the path judging abilities generalize to downstream diagnostic tasks, including diagnosis summarization and medical question answering. Experiments with three open source instruct-tuned LLMs reveal both promise and brittleness: while specific reward optimization and distillation lead to strong path-judging performance, the transferability to downstream tasks remain weak. Our finding provides the first systematic assessment of "reward model style" reasoning over clinical KGs, offering insights into how structured, reward-based supervision influences diagnostic reasoning in GenAI systems for healthcare.

Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs

Computation and Language

Helps computers answer hard questions better.

12 Jun 2025 0

90%

Enhancing Large Language Models with Reliable Knowledge Graphs

Computation and Language

Makes AI smarter and more truthful.

16 Jun 2025 0

90%

MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs

Computation and Language

Helps AI doctors think through patient problems.

1 Apr 2025 2

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

13 pages

Brittleness and Promise: Knowledge Graph Based Reward Modeling for Diagnostic Reasoning

Helps doctors find sicknesses by checking reasoning.

Technical Abstract

Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs

Enhancing Large Language Models with Reliable Knowledge Graphs

MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs