CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic
By: Yaocheng Zhang , Haohuan Huang , Zijun Song and more
Potential Business Impact:
Helps AI learn to answer questions better.
Tool-Integrated Reasoning (TIR) with search engines enables large language models to iteratively retrieve up-to-date external knowledge, enhancing adaptability and generalization in complex question-answering tasks. However, existing search agent pipelines typically depend on reinforcement learning based optimization, which often suffers from sparse outcome rewards, leading to inefficient exploration and unstable training. We introduce CriticSearch, a fine-grained credit-assignment framework that supplies dense, turn-level feedback via a retrospective critic mechanism. During training, a frozen, asymmetric critique LLM retrospectively evaluates each turn using privileged information from the full trajectory and gold answers, converting these assessments into stable, dense rewards that guide policy improvement. Experimental results across diverse multi-hop reasoning benchmarks demonstrate that CriticSearch consistently outperforms existing baselines, achieving faster convergence, improved training stability, and higher performance.
Similar Papers
SmartSearch: Process Reward-Guided Query Refinement for Search Agents
Artificial Intelligence
Improves computer searches for better answers.
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
Computation and Language
Helps AI find answers with proof.
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Machine Learning (CS)
Teaches AI to solve complex problems step-by-step.