Understanding the Characteristics of LLM-Generated Property-Based Tests in Exploring Edge Cases
By: Hidetake Tanaka , Haruto Tanaka , Kazumasa Shimari and more
Potential Business Impact:
Finds more bugs in computer code.
As Large Language Models (LLMs) increasingly generate code in software development, ensuring the quality of LLM-generated code has become important. Traditional testing approaches using Example-based Testing (EBT) often miss edge cases -- defects that occur at boundary values, special input patterns, or extreme conditions. This research investigates the characteristics of LLM-generated Property-based Testing (PBT) compared to EBT for exploring edge cases. We analyze 16 HumanEval problems where standard solutions failed on extended test cases, generating both PBT and EBT test codes using Claude-4-sonnet. Our experimental results reveal that while each method individually achieved a 68.75\% bug detection rate, combining both approaches improved detection to 81.25\%. The analysis demonstrates complementary characteristics: PBT effectively detects performance issues and edge cases through extensive input space exploration, while EBT effectively detects specific boundary conditions and special patterns. These findings suggest that a hybrid approach leveraging both testing methods can improve the reliability of LLM-generated code, providing guidance for test generation strategies in LLM-based code generation.
Similar Papers
Agentic Property-Based Testing: Finding Bugs Across the Python Ecosystem
Software Engineering
Finds bugs in computer programs automatically.
LLM-based Property-based Test Generation for Guardrailing Cyber-Physical Systems
Software Engineering
Makes smart machines safer by testing them automatically.
Test Wars: A Comparative Study of SBST, Symbolic Execution, and LLM-Based Approaches to Unit Test Generation
Software Engineering
AI helps find bugs in computer code.