Cerberus: Multi-Agent Reasoning and Coverage-Guided Exploration for Static Detection of Runtime Errors
By: Hridya Dhulipala, Xiaokai Rong, Tien N. Nguyen
In several software development scenarios, it is desirable to detect runtime errors and exceptions in code snippets without actual execution. A typical example is to detect runtime exceptions in online code snippets before integrating them into a codebase. In this paper, we propose Cerberus, a novel predictive, execution-free coverage-guided testing framework. Cerberus uses LLMs to generate the inputs that trigger runtime errors and to perform code coverage prediction and error detection without code execution. With a two-phase feedback loop, Cerberus first aims to both increasing code coverage and detecting runtime errors, then shifts to focus only detecting runtime errors when the coverage reaches 100% or its maximum, enabling it to perform better than prompting the LLMs for both purposes. Our empirical evaluation demonstrates that Cerberus performs better than conventional and learning-based testing frameworks for (in)complete code snippets by generating high-coverage test cases more efficiently, leading to the discovery of more runtime errors.
Similar Papers
Demystifying Errors in LLM Reasoning Traces: An Empirical Study of Code Execution Simulation
Software Engineering
Helps computers explain their thinking when writing code.
BugScope: Learn to Find Bugs Like Human
Software Engineering
Finds hidden computer program mistakes better.
Validating Solidity Code Defects using Symbolic and Concrete Execution powered by Large Language Models
Software Engineering
Finds bugs in online money code.