Score: 1

RingSQL: Generating Synthetic Data with Schema-Independent Templates for Text-to-SQL Reasoning Models

Published: January 9, 2026 | arXiv ID: 2601.05451v1

By: Marko Sterbentz , Kevin Cushing , Cameron Barrie and more

Potential Business Impact:

Makes computers understand questions to find data.

Business Areas:

Text Analytics Data and Analytics, Software

Recent advances in text-to-SQL systems have been driven by larger models and improved datasets, yet progress is still limited by the scarcity of high-quality training data. Manual data creation is expensive, and existing synthetic methods trade off reliability and scalability. Template-based approaches ensure correct SQL but require schema-specific templates, while LLM-based generation scales easily but lacks quality and correctness guarantees. We introduce RingSQL, a hybrid data generation framework that combines schema-independent query templates with LLM-based paraphrasing of natural language questions. This approach preserves SQL correctness across diverse schemas while providing broad linguistic variety. In our experiments, we find that models trained using data produced by RingSQL achieve an average gain in accuracy of +2.3% across six text-to-SQL benchmarks when compared to models trained on other synthetic data. We make our code available at https://github.com/nu-c3lab/RingSQL.

SING-SQL: A Synthetic Data Generation Framework for In-Domain Text-to-SQL Translation

Artificial Intelligence

Lets computers understand any database questions.

30 Sep 2025 2

90%

Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation

Human-Computer Interaction

Helps computers understand database questions faster.

21 Feb 2025 1

89%

SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs

Computation and Language

Makes computer programs better at understanding data.

19 May 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

19 pages

RingSQL: Generating Synthetic Data with Schema-Independent Templates for Text-to-SQL Reasoning Models

Makes computers understand questions to find data.

Technical Abstract

SING-SQL: A Synthetic Data Generation Framework for In-Domain Text-to-SQL Translation

Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation

SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs