Score: 0

Evaluating LLM Generated Detection Rules in Cybersecurity

Published: September 20, 2025 | arXiv ID: 2509.16749v1

By: Anna Bertiger , Bobby Filar , Aryan Luthra and more

Potential Business Impact:

Tests if AI writes good computer security rules.

Business Areas:

Legal Tech Professional Services

LLMs are increasingly pervasive in the security environment, with limited measures of their effectiveness, which limits trust and usefulness to security practitioners. Here, we present an open-source evaluation framework and benchmark metrics for evaluating LLM-generated cybersecurity rules. The benchmark employs a holdout set-based methodology to measure the effectiveness of LLM-generated security rules in comparison to a human-generated corpus of rules. It provides three key metrics inspired by the way experts evaluate security rules, offering a realistic, multifaceted evaluation of the effectiveness of an LLM-based security rule generator. This methodology is illustrated using rules from Sublime Security's detection team and those written by Sublime Security's Automated Detection Engineer (ADE), with a thorough analysis of ADE's skills presented in the results section.