Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
By: Qi He , Cheng Qian , Xiusi Chen and more
Potential Business Impact:
Helps computers check if online stories are true.
Claim verification with large language models (LLMs) has recently attracted considerable attention, owing to their superior reasoning capabilities and transparent verification pathways compared to traditional answer-only judgments. Online claim verification requires iterative evidence retrieval and reasoning, yet existing approaches mainly rely on prompt engineering or predesigned reasoning workflows without offering a unified training paradigm to improve necessary skills. Therefore, we introduce Veri-R1, an online reinforcement learning (RL) framework that enables an LLM to interact with a search engine and to receive reward signals that explicitly shape its planning, retrieval, and reasoning behaviors. The dynamic interaction between models and retrieval systems more accurately reflects real-world verification scenarios and fosters comprehensive verification skills. Empirical results show that Veri-R1 improves joint accuracy by up to 30% and doubles evidence score, often surpassing larger-scale counterparts. Ablation studies further reveal the impact of reward components and the link between output logits and label accuracy. Our results highlight the effectiveness of online RL for precise and faithful claim verification and provide a foundation for future research. We release our code to support community progress in LLM empowered claim verification.
Similar Papers
From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
Machine Learning (CS)
Helps AI check its own thinking better.
Incentivizing LLMs to Self-Verify Their Answers
Machine Learning (CS)
Helps computers check their own math answers.
Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle
Computation and Language
Teaches computers to think and follow instructions better.