Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation
By: Zhichao Xu , Zongyu Wu , Yun Zhou and more
Potential Business Impact:
Makes AI's thinking steps more honest.
Inspired by the success of reinforcement learning (RL) in Large Language Model (LLM) training for domains like math and code, recent works have begun exploring how to train LLMs to use search engines more effectively as tools for retrieval-augmented generation. Although these methods achieve performance improvement across QA benchmarks, many prioritize final answer correctness while overlooking the quality of intermediate reasoning steps, which may lead to chain-of-thought unfaithfulness. In this paper, we first introduce a comprehensive evaluation framework for evaluating RL-based search agents, covering three distinct faithfulness metrics: information-think faithfulness, think-answer faithfulness, and think-search faithfulness. Our evaluations reveal that a prototypical RL-based search agent, Search-R1, has significant room for improvement in this regard. To foster faithful reasoning, we introduce VERITAS (Verifying Entailed Reasoning through Intermediate Traceability in Agentic Search), a novel framework that integrates fine-grained faithfulness rewards into the reinforcement learning process. Our experiments show that models trained with VERITAS not only significantly improve reasoning faithfulness, but also achieve comparable task performance across seven QA benchmarks.
Similar Papers
Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis
Artificial Intelligence
AI explains medical guesses using proof.
Lessons from Training Grounded LLMs with Verifiable Rewards
Computation and Language
Makes AI answers more truthful and proven.
Learning to Reason for Factuality
Computation and Language
Makes AI write true stories, not made-up ones.