RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis
By: Zhengwei Tao , Bo Li , Jialong Wu and more
Agentic Retrieval-Augmented Generation (RAG) empowers large language models to autonomously plan and retrieve information for complex problem-solving. However, the development of robust agents is hindered by the scarcity of high-quality training data that reflects the noise and complexity of real-world retrieval environments. Conventional manual annotation is unscalable and often fails to capture the dynamic reasoning strategies required to handle retrieval failures. To bridge this gap, we introduce RAGShaper, a novel data synthesis framework designed to automate the construction of RAG tasks and robust agent trajectories. RAGShaper incorporates an InfoCurator to build dense information trees enriched with adversarial distractors spanning Perception and Cognition levels. Furthermore, we propose a constrained navigation strategy that forces a teacher agent to confront these distractors, thereby eliciting trajectories that explicitly demonstrate error correction and noise rejection. Comprehensive experiments confirm that models trained on our synthesized corpus significantly outperform existing baselines, exhibiting superior robustness in noise-intensive and complex retrieval tasks.
Similar Papers
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Artificial Intelligence
AI agents help computers answer questions with new info.
Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation
Artificial Intelligence
Helps computers understand tricky money words better.
Optimizing Retrieval for RAG via Reinforced Contrastive Learning
Computation and Language
AI learns to find better information for itself.