Score: 0

LLM-Driven Data Generation and a Novel Soft Metric for Evaluating Text-to-SQL in Aviation MRO

Published: June 11, 2025 | arXiv ID: 2506.13785v1

By: Patrick Sutanto , Jonathan Kenrick , Max Lorenz and more

Potential Business Impact:

Helps computers understand data questions better.

Business Areas:

Text Analytics Data and Analytics, Software

The application of Large Language Models (LLMs) to text-to-SQL tasks promises to democratize data access, particularly in critical industries like aviation Maintenance, Repair, and Operation (MRO). However, progress is hindered by two key challenges: the rigidity of conventional evaluation metrics such as execution accuracy, which offer coarse, binary feedback, and the scarcity of domain-specific evaluation datasets. This paper addresses these gaps. To enable more nuanced assessment, we introduce a novel F1-score-based 'soft' metric that quantifies the informational overlap between generated and ground-truth SQL results. To address data scarcity, we propose an LLM-driven pipeline that synthesizes realistic question-SQL pairs from database schemas. We demonstrate our contributions through an empirical evaluation on an authentic MRO database. Our experiments show that the proposed soft metric provides more insightful performance analysis than strict accuracy, and our data generation technique is effective in creating a domain-specific benchmark. Together, these contributions offer a robust framework for evaluating and advancing text-to-SQL systems in specialized environments.

LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering

Computation and Language

Tests if AI can safely design airplane parts.

25 Jan 2025 0

89%

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation

Machine Learning (CS)

Finds the right database for your questions.

8 Aug 2025 1

89%

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation

Machine Learning (CS)

Finds the right database for your questions.

8 Aug 2025 1

View PDF Login to Bookmark

Page Count

21 pages

LLM-Driven Data Generation and a Novel Soft Metric for Evaluating Text-to-SQL in Aviation MRO

Helps computers understand data questions better.

Technical Abstract

LLM Evaluation Based on Aerospace Manufacturing Expertise: Automated Generation and Multi-Model Question Answering

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation