Score: 0

Reinforcing Code Generation: Improving Text-to-SQL with Execution-Based Learning

Published: June 6, 2025 | arXiv ID: 2506.06093v1

By: Atharv Kulkarni, Vivek Srikumar

Potential Business Impact:

Teaches computers to write correct database answers.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

In this work, we study the problem of code generation with a large language model (LLM), with a focus on generating SQL queries from natural language questions. We ask: Instead of using supervised fine tuning with text-code pairs, can we tune a model by having it interact with a database engine? We frame this problem as a reinforcement learning problem where the model receives execution-based feedback from the environment in the form of scalar rewards. These rewards penalize execution failures and assign positive values when a query returns a correct answer. We use the rewards within the Group Relative Policy Optimization (GRPO) framework. We use a tabular reasoning benchmark to test and evaluate our findings. We find that with only weak supervision in the form of question-answer pairs, RL-tuning improves the accuracy of model generated SQL code from 31.49 to 49.83 while reducing error percentage from 25.43% to 14.71%. This improvement allowed the model nearly match the performance performance to the larger SQLCoder-70B model. Our work demonstrates the potential of using execution-based feedback to improve symbolic reasoning capabilities of LLMs.

Country of Origin
🇺🇸 United States

Page Count
12 pages

Category
Computer Science:
Computation and Language