ReFuzzer: Feedback-Driven Approach to Enhance Validity of LLM-Generated Test Programs
By: Iti Shree, Karine Even-Mendoz, Tomasz Radzik
Potential Business Impact:
Fixes computer code errors to test programs better.
Existing LLM-based compiler fuzzers often produce syntactically or semantically invalid test programs, limiting their effectiveness in exercising compiler optimizations and backend components. We introduce ReFuzzer, a framework for refining LLM-generated test programs by systematically detecting and correcting compilation and runtime violations (e.g. division by zero or array out-of-bounds accesses). ReFuzzer employs a feedback loop with a local LLM to validate and filter erroneous programs before execution, improving fuzzing effectiveness beyond crash detection and enabling the generation of diverse yet valid test programs. We evaluated ReFuzzer's effectiveness across black-, grey- and white-box fuzzing approaches targeting LLVM/Clang. ReFuzzer improved test programs' validity from 47.0-49.4% to 96.6-97.3%, with an average processing time of 2.9-3.5 s per test program on a dual-GPU machine. Further, refuzzing significantly increased code coverage in critical optimization and IR generation components. For example, vectorization coverage had an absolute improvement of 9.2%, 2.3%, and 7.1% in black-, grey-, and white-box fuzzing, enhancing testing effectiveness.
Similar Papers
ORFuzz: Fuzzing the "Other Side" of LLM Safety -- Testing Over-Refusal
Software Engineering
Finds when AI wrongly says "no" to requests.
ELFuzz: Efficient Input Generation via LLM-driven Synthesis Over Fuzzer Space
Cryptography and Security
Finds software bugs automatically.
ReFuzz: Reusing Tests for Processor Fuzzing with Contextual Bandits
Cryptography and Security
Finds hidden computer chip flaws faster by learning from old ones.