Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
By: Josefa Lia Stoisser, Marc Boubnovski Martell, Julien Fauqueur
Potential Business Impact:
Teaches computers to understand and use data tables.
This work reframes the Text-to-SQL task as a pathway for teaching large language models (LLMs) to reason over and manipulate tabular data--moving beyond the traditional focus on query generation. We propose a two-stage framework that leverages SQL supervision to develop transferable table reasoning capabilities. First, we synthesize detailed chain-of-thought (CoT) traces from real-world SQL queries, providing step-by-step, clause-level supervision that teaches the model how to traverse, filter, and aggregate table fields. Second, we introduce a Group Relative Policy Optimization (GRPO) reinforcement learning objective that connects SQL execution accuracy to generalizable reasoning by encouraging steps that extend beyond task-specific syntax and transfer across datasets. Empirically, our approach improves performance on standard Text-to-SQL benchmarks and achieves substantial gains on reasoning-intensive datasets such as BIRD and CRT-QA, demonstrating enhanced generalization and interpretability. Specifically, the distilled-quantized LLaMA model achieved a relative 33.9\% increase in accuracy when trained on Text-to-SQL tasks, while Qwen achieved a relative 14.5\% increase. These results suggest that SQL can serve not only as a target formalism but also as an effective scaffold for learning robust, transferable reasoning over structured data.
Similar Papers
Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning
Computation and Language
Teaches computers to write correct database answers.
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
Machine Learning (CS)
Makes small AI understand complex database questions.
ConstrainedSQL: Training LLMs for Text2SQL via Constrained Reinforcement Learning
Machine Learning (CS)
Teaches computers to answer questions from data better.