AGRO-SQL: Agentic Group-Relative Optimization with High-Fidelity Data Synthesis
By: Cehua Yang , Dongyu Xiao , Junming Lin and more
Potential Business Impact:
Teaches computers to answer questions from data.
The advancement of Text-to-SQL systems is currently hindered by the scarcity of high-quality training data and the limited reasoning capabilities of models in complex scenarios. In this paper, we propose a holistic framework that addresses these issues through a dual-centric approach. From a Data-Centric perspective, we construct an iterative data factory that synthesizes RL-ready data characterized by high correctness and precise semantic-logic alignment, ensured by strict verification. From a Model-Centric perspective, we introduce a novel Agentic Reinforcement Learning framework. This framework employs a Diversity-Aware Cold Start stage to initialize a robust policy, followed by Group Relative Policy Optimization (GRPO) to refine the agent's reasoning via environmental feedback. Extensive experiments on BIRD and Spider benchmarks demonstrate that our synergistic approach achieves state-of-the-art performance among single-model methods.
Similar Papers
Repurposing Synthetic Data for Fine-grained Search Agent Supervision
Computation and Language
Teaches AI to learn from "almost right" answers.
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Artificial Intelligence
Helps AI agents learn specialized jobs better.
Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning
CV and Pattern Recognition
Makes AI better at understanding pictures and words.