From Facts to Conclusions : Integrating Deductive Reasoning in Retrieval-Augmented LLMs
By: Shubham Mishra , Samyek Jain , Gorang Mehrishi and more
Potential Business Impact:
Makes AI answers more truthful and explainable.
Retrieval-Augmented Generation (RAG) grounds large language models (LLMs) in external evidence, but fails when retrieved sources conflict or contain outdated or subjective information. Prior work address these issues independently but lack unified reasoning supervision. We propose a reasoning-trace-augmented RAG framework that adds structured, interpretable reasoning across three stages : (1) document-level adjudication, (2) conflict analysis, and (3) grounded synthesis, producing citation-linked answers or justified refusals. A Conflict-Aware Trust-Score (CATS) pipeline is introduced which evaluates groundedness, factual correctness, refusal accuracy, and conflict-behavior alignment using an LLM-as-a-Judge. Our 539-query reasoning dataset and evaluation pipeline establish a foundation for conflict-aware, interpretable RAG systems. Experimental results demonstrate substantial gains over baselines, most notably with Qwen, where Supervised Fine-Tuning improved End-to-End answer correctness from 0.069 to 0.883 and behavioral adherence from 0.074 to 0.722.
Similar Papers
TruthfulRAG: Resolving Factual-level Conflicts in Retrieval-Augmented Generation with Knowledge Graphs
Computation and Language
Fixes AI answers when its knowledge is wrong.
Retrieval-augmented reasoning with lean language models
Computation and Language
Lets small computers answer hard questions accurately.
Probing Latent Knowledge Conflict for Faithful Retrieval-Augmented Generation
Computation and Language
Makes AI answers more truthful and less wrong.