Score: 0

Rethinking Agentic Workflows: Evaluating Inference-Based Test-Time Scaling Strategies in Text2SQL Tasks

Published: October 13, 2025 | arXiv ID: 2510.10885v1

By: Jiajing Guo , Kenil Patel , Jorge Piazentin Ono and more

Potential Business Impact:

Lets computers answer questions from data.

Business Areas:

Text Analytics Data and Analytics, Software

Large language models (LLMs) are increasingly powering Text-to-SQL (Text2SQL) systems, enabling non-expert users to query industrial databases using natural language. While test-time scaling strategies have shown promise in LLM-based solutions, their effectiveness in real-world applications, especially with the latest reasoning models, remains uncertain. In this work, we benchmark six lightweight, industry-oriented test-time scaling strategies and four LLMs, including two reasoning models, evaluating their performance on the BIRD Mini-Dev benchmark. Beyond standard accuracy metrics, we also report inference latency and token consumption, providing insights relevant for practical system deployment. Our findings reveal that Divide-and-Conquer prompting and few-shot demonstrations consistently enhance performance for both general-purpose and reasoning-focused LLMs. However, introducing additional workflow steps yields mixed results, and base model selection plays a critical role. This work sheds light on the practical trade-offs between accuracy, efficiency, and complexity when deploying Text2SQL systems.

The Art of Scaling Test-Time Compute for Large Language Models

Computation and Language

Makes AI think better by changing how it works.

1 Dec 2025 1

90%

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation

Machine Learning (CS)

Finds the right database for your questions.

8 Aug 2025 1

90%

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation

Machine Learning (CS)

Finds the right database for your questions.

8 Aug 2025 1

View PDF Login to Bookmark

Page Count

9 pages

Rethinking Agentic Workflows: Evaluating Inference-Based Test-Time Scaling Strategies in Text2SQL Tasks

Lets computers answer questions from data.

Technical Abstract

The Art of Scaling Test-Time Compute for Large Language Models

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation