Score: 2

Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

Published: May 7, 2025 | arXiv ID: 2505.04671v2

By: Yuxin Zhang , Meihao Fan , Ju Fan and more

BigTech Affiliations: Alibaba

Potential Business Impact:

Makes computers better at answering questions from data.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent advances in large language models (LLMs) have significantly improved performance on the Text-to-SQL task by leveraging their powerful reasoning capabilities. To enhance accuracy during the reasoning process, external Process Reward Models (PRMs) can be introduced during training and inference to provide fine-grained supervision. However, if misused, PRMs may distort the reasoning trajectory and lead to suboptimal or incorrect SQL generation. To address this challenge, we propose Reward-SQL, a framework that systematically explores how to incorporate PRMs into the Text-to-SQL reasoning process effectively. Our approach follows a "cold start, then PRM supervision" paradigm. Specifically, we first train the model to decompose SQL queries into structured stepwise reasoning chains using common table expressions (Chain-of-CTEs), establishing a strong and interpretable reasoning baseline. Then, we investigate four strategies for integrating PRMs, and find that combining PRM as an online training signal (e.g.,GRPO) with PRM-guided inference (e.g., best-of-N sampling) yields the best results. Empirically, on the BIRD benchmark, Reward-SQL enables models supervised by PRM (7B) to achieve a 13.1% performance gain across various guidance strategies. Notably, our GRPO-aligned policy model based on Qwen2.5-Coder-7B-Instruct achieves 68.9% accuracy on the BIRD development set, outperforming all baseline methods under the same model size. These results demonstrate the effectiveness of Reward-SQL in leveraging reward-based supervision for Text-to-SQL reasoning.

Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward

Machine Learning (CS)

Makes computers write better database questions.

18 May 2025 0

91%

Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning

Computation and Language

Teaches computers to understand and use data tables.

23 Apr 2025 0

91%

Exploring Generative Process Reward Modeling for Semi-Structured Data: A Case Study of Table Question Answering

Computation and Language

Helps computers answer questions from tables better.

23 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

24 pages

Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

Makes computers better at answering questions from data.

Technical Abstract

Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward

Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning

Exploring Generative Process Reward Modeling for Semi-Structured Data: A Case Study of Table Question Answering