SastBench: A Benchmark for Testing Agentic SAST Triage
By: Jake Feiglin, Guy Dar
Potential Business Impact:
Helps computers find software problems faster.
SAST (Static Application Security Testing) tools are among the most widely used techniques in defensive cybersecurity, employed by commercial and non-commercial organizations to identify potential vulnerabilities in software. Despite their great utility, they generate numerous false positives, requiring costly manual filtering (aka triage). While LLM-powered agents show promise for automating cybersecurity tasks, existing benchmarks fail to emulate real-world SAST finding distributions. We introduce SastBench, a benchmark for evaluating SAST triage agents that combines real CVEs as true positives with filtered SAST tool findings as approximate false positives. SastBench features an agent-agnostic design. We evaluate different agents on the benchmark and present a comparative analysis of their performance, provide a detailed analysis of the dataset, and discuss the implications for future development.
Similar Papers
LLM-Driven SAST-Genius: A Hybrid Static Analysis Framework for Comprehensive and Actionable Security
Cryptography and Security
Finds computer bugs better, with fewer mistakes.
ZeroFalse: Improving Precision in Static Analysis with LLMs
Software Engineering
Fixes computer code errors without false alarms.
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
Machine Learning (CS)
Tests AI for finding and fixing computer bugs.