CARE-RAG - Clinical Assessment and Reasoning in RAG
By: Deepthi Potluri , Aby Mammen Mathew , Jeffrey B DeWitt and more
Potential Business Impact:
Helps computers use medical rules to give good advice.
Access to the right evidence does not guarantee that large language models (LLMs) will reason with it correctly. This gap between retrieval and reasoning is especially concerning in clinical settings, where outputs must align with structured protocols. We study this gap using Written Exposure Therapy (WET) guidelines as a testbed. In evaluating model responses to curated clinician-vetted questions, we find that errors persist even when authoritative passages are provided. To address this, we propose an evaluation framework that measures accuracy, consistency, and fidelity of reasoning. Our results highlight both the potential and the risks: retrieval-augmented generation (RAG) can constrain outputs, but safe deployment requires assessing reasoning as rigorously as retrieval.
Similar Papers
Grounding Large Language Models in Clinical Evidence: A Retrieval-Augmented Generation System for Querying UK NICE Clinical Guidelines
Computation and Language
Helps doctors find medical advice fast.
MedCoT-RAG: Causal Chain-of-Thought RAG for Medical Question Answering
Computation and Language
Helps doctors answer tough medical questions better.
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Computation and Language
Makes AI doctors more truthful and helpful.