Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning
By: Atharv Kulkarni, Vivek Srikumar
Potential Business Impact:
Teaches computers to write correct database answers.
In this work, we study the problem of code generation with a large language model (LLM), with a focus on generating SQL queries from natural language questions. We ask: Instead of using supervised fine tuning with text-code pairs, can we tune a model by having it interact with a database engine? We frame this problem as a reinforcement learning problem where the model receives execution-based feedback from the environment in the form of scalar rewards. These rewards penalize execution failures and assign positive values when a query returns a correct answer. We use the rewards within the Group Relative Policy Optimization (GRPO) framework. We use a tabular reasoning benchmark to test and evaluate our findings. We find that with only weak supervision in the form of question-answer pairs, RL-tuning improves the accuracy of model generated SQL code from 31.49 to 49.83 while reducing error percentage from 25.43% to 14.71%. This improvement allowed the model nearly match the performance performance to the larger SQLCoder-70B model. Our work demonstrates the potential of using execution-based feedback to improve symbolic reasoning capabilities of LLMs.
Similar Papers
Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
Computation and Language
Teaches computers to understand and use data tables.
ConstrainedSQL: Training LLMs for Text2SQL via Constrained Reinforcement Learning
Machine Learning (CS)
Teaches computers to answer questions from data better.
Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward
Machine Learning (CS)
Makes computers write better database questions.