FaithLens: Detecting and Explaining Faithfulness Hallucination
By: Shuzheng Si , Qingyi Wang , Haozhe Zhao and more
Potential Business Impact:
Finds fake facts in AI writing.
Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.
Similar Papers
Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs
Artificial Intelligence
Makes AI think more carefully and be more truthful.
Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models?
Computation and Language
Makes AI give honest reasons for its answers.
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
Computation and Language
Checks if AI's answers are honest.