Testing Database Systems with Large Language Model Synthesized Fragments
By: Suyang Zhong, Manuel Rigger
Potential Business Impact:
Finds hidden bugs in computer databases.
Various automated testing approaches have been proposed for Database Management Systems (DBMSs). Many such approaches generate pairs of equivalent queries to identify bugs that cause DBMSs to compute incorrect results, and have found hundreds of bugs in mature, widely used DBMSs. Most of these approaches are based on manually written SQL generators; however, their bug-finding capabilities remain constrained by the limited set of SQL features supported by the generators. In this work, we propose ShQveL, an approach that augments existing SQL test-case generators by leveraging Large Language Models (LLMs) to synthesize SQL fragments. Our key idea is to systematically incorporate SQL features gained through automated interactions with LLMs into the SQL generators, increasing the features covered while efficiently generating test cases. Specifically, ShQveL uses SQL sketches -- SQL statements with incomplete code segments that LLMs fill -- to integrate LLM-generated content into the generator. We evaluated ShQveL on 5 DBMSs and discovered 55 unique and previously unknown bugs, 50 of which were promptly fixed after our reports.
Similar Papers
MageSQL: Enhancing In-context Learning for Text-to-SQL Applications with Large Language Models
Databases
Helps computers understand questions to find data.
SING-SQL: A Synthetic Data Generation Framework for In-Domain Text-to-SQL Translation
Artificial Intelligence
Lets computers understand any database questions.
Automated Discovery of Test Oracles for Database Management Systems Using LLMs
Databases
Finds hidden computer database mistakes automatically.