Understanding LLM-Driven Test Oracle Generation
By: Adam Bodicoat, Gunel Jahangirova, Valerio Terragni
Potential Business Impact:
AI finds bugs in computer programs automatically.
Automated unit test generation aims to improve software quality while reducing the time and effort required for creating tests manually. However, existing techniques primarily generate regression oracles that predicate on the implemented behavior of the class under test. They do not address the oracle problem: the challenge of distinguishing correct from incorrect program behavior. With the rise of Foundation Models (FMs), particularly Large Language Models (LLMs), there is a new opportunity to generate test oracles that reflect intended behavior. This positions LLMs as enablers of Promptware, where software creation and testing are driven by natural-language prompts. This paper presents an empirical study on the effectiveness of LLMs in generating test oracles that expose software failures. We investigate how different prompting strategies and levels of contextual input impact the quality of LLM-generated oracles. Our findings offer insights into the strengths and limitations of LLM-based oracle generation in the FM era, improving our understanding of their capabilities and fostering future research in this area.
Similar Papers
Large Language Models for Unit Test Generation: Achievements, Challenges, and the Road Ahead
Software Engineering
Helps computers write better code tests automatically.
Automated Discovery of Test Oracles for Database Management Systems Using LLMs
Databases
Finds hidden computer database mistakes automatically.
Artificial or Just Artful? Do LLMs Bend the Rules in Programming?
Software Engineering
Helps AI write better code by using tests.