Score: 2

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses

Published: March 3, 2025 | arXiv ID: 2503.01811v1

By: Nicholas Carlini , Javier Rando , Edoardo Debenedetti and more

Potential Business Impact:

Tests if AI can break computer security defenses.

Business Areas:

A/B Testing Data and Analytics

We introduce AutoAdvExBench, a benchmark to evaluate if large language models (LLMs) can autonomously exploit defenses to adversarial examples. Unlike existing security benchmarks that often serve as proxies for real-world tasks, bench directly measures LLMs' success on tasks regularly performed by machine learning security experts. This approach offers a significant advantage: if a LLM could solve the challenges presented in bench, it would immediately present practical utility for adversarial machine learning researchers. We then design a strong agent that is capable of breaking 75% of CTF-like ("homework exercise") adversarial example defenses. However, we show that this agent is only able to succeed on 13% of the real-world defenses in our benchmark, indicating the large gap between difficulty in attacking "real" code, and CTF-like code. In contrast, a stronger LLM that can attack 21% of real defenses only succeeds on 54% of CTF-like defenses. We make this benchmark available at https://github.com/ethz-spylab/AutoAdvExBench.

CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

Cryptography and Security

Tests AI's ability to hack websites safely.

21 Mar 2025 2

88%

DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments

Computation and Language

Tests AI to find computer security problems.

31 May 2025 4

88%

AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks

Cryptography and Security

Helps computers understand attack steps in security reports.

5 Mar 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

16 pages

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses

Tests if AI can break computer security defenses.

Technical Abstract

CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments

AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks