Validating Formal Specifications with LLM-generated Test Cases
By: Alcino Cunha, Nuno Macedo
Potential Business Impact:
Helps computers find mistakes in code automatically.
Validation is a central activity when developing formal specifications. Similarly to coding, a possible validation technique is to define upfront test cases or scenarios that a future specification should satisfy or not. Unfortunately, specifying such test cases is burdensome and error prone, which could cause users to skip this validation task. This paper reports the results of an empirical evaluation of using pre-trained large language models (LLMs) to automate the generation of test cases from natural language requirements. In particular, we focus on test cases for structural requirements of simple domain models formalized in the Alloy specification language. Our evaluation focuses on the state-of-art GPT-5 model, but results from other closed- and open-source LLMs are also reported. The results show that, in this context, GPT-5 is already quite effective at generating positive (and negative) test cases that are syntactically correct and that satisfy (or not) the given requirement, and that can detect many wrong specifications written by humans.
Similar Papers
Extracting Formal Specifications from Documents Using LLMs for Automated Testing
Software Engineering
Helps computers understand software rules automatically.
Large Language Models for Unit Test Generation: Achievements, Challenges, and the Road Ahead
Software Engineering
Helps computers write better code tests automatically.
Towards Autoformalization of LLM-generated Outputs for Requirement Verification
Computation and Language
Checks if computer writing matches what we want.