Score: 0

Testing Database Systems with Large Language Model Synthesized Fragments

Published: May 4, 2025 | arXiv ID: 2505.02012v1

By: Suyang Zhong, Manuel Rigger

Potential Business Impact:

Finds hidden bugs in computer databases.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Various automated testing approaches have been proposed for Database Management Systems (DBMSs). Many such approaches generate pairs of equivalent queries to identify bugs that cause DBMSs to compute incorrect results, and have found hundreds of bugs in mature, widely used DBMSs. Most of these approaches are based on manually written SQL generators; however, their bug-finding capabilities remain constrained by the limited set of SQL features supported by the generators. In this work, we propose ShQveL, an approach that augments existing SQL test-case generators by leveraging Large Language Models (LLMs) to synthesize SQL fragments. Our key idea is to systematically incorporate SQL features gained through automated interactions with LLMs into the SQL generators, increasing the features covered while efficiently generating test cases. Specifically, ShQveL uses SQL sketches -- SQL statements with incomplete code segments that LLMs fill -- to integrate LLM-generated content into the generator. We evaluated ShQveL on 5 DBMSs and discovered 55 unique and previously unknown bugs, 50 of which were promptly fixed after our reports.