Score: 1

Automated Penetration Testing with LLM Agents and Classical Planning

Published: December 11, 2025 | arXiv ID: 2512.11143v1

By: Lingzhi Wang , Xinyi Shi , Ziyu Li and more

Potential Business Impact:

Makes computers find computer security flaws faster.

Business Areas:

Penetration Testing Information Technology, Privacy and Security

While penetration testing plays a vital role in cybersecurity, achieving fully automated, hands-off-the-keyboard execution remains a significant research challenge. In this paper, we introduce the "Planner-Executor-Perceptor (PEP)" design paradigm and use it to systematically review existing work and identify the key challenges in this area. We also evaluate existing penetration testing systems, with a particular focus on the use of Large Language Model (LLM) agents for this task. The results show that the out-of-the-box Claude Code and Sonnet 4.5 exhibit superior penetration capabilities observed to date, substantially outperforming all prior systems. However, a detailed analysis of their testing processes reveals specific strengths and limitations; notably, LLM agents struggle with maintaining coherent long-horizon plans, performing complex reasoning, and effectively utilizing specialized tools. These limitations significantly constrain its overall capability, efficiency, and stability. To address these limitations, we propose CHECKMATE, a framework that integrates enhanced classical planning with LLM agents, providing an external, structured "brain" that mitigates the inherent weaknesses of LLM agents. Our evaluation shows that CHECKMATE outperforms the state-of-the-art system (Claude Code) in penetration capability, improving benchmark success rates by over 20%. In addition, it delivers substantially greater stability, cutting both time and monetary costs by more than 50%.

Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees

Cryptography and Security

Helps computers find computer weaknesses faster.

9 Sep 2025 2

89%

Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations

Cryptography and Security

Makes AI agents safer and more reliable.

10 Sep 2025 2

89%

From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing

Artificial Intelligence

Helps computers find computer security holes.

16 Sep 2025 0

View PDF Login to Bookmark

Page Count

13 pages

Automated Penetration Testing with LLM Agents and Classical Planning

Makes computers find computer security flaws faster.

Technical Abstract

Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees

Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations

From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing