Score: 0

Learning Robust Penetration-Testing Policies under Partial Observability: A systematic evaluation

Published: September 24, 2025 | arXiv ID: 2509.20008v1

By: Raphael Simon, Pieter Libin, Wim Mees

Potential Business Impact:

Teaches computers to find computer weaknesses faster.

Business Areas:

Penetration Testing Information Technology, Privacy and Security

Penetration testing, the simulation of cyberattacks to identify security vulnerabilities, presents a sequential decision-making problem well-suited for reinforcement learning (RL) automation. Like many applications of RL to real-world problems, partial observability presents a major challenge, as it invalidates the Markov property present in Markov Decision Processes (MDPs). Partially Observable MDPs require history aggregation or belief state estimation to learn successful policies. We investigate stochastic, partially observable penetration testing scenarios over host networks of varying size, aiming to better reflect real-world complexity through more challenging and representative benchmarks. This approach leads to the development of more robust and transferable policies, which are crucial for ensuring reliable performance across diverse and unpredictable real-world environments. Using vanilla Proximal Policy Optimization (PPO) as a baseline, we compare a selection of PPO variants designed to mitigate partial observability, including frame-stacking, augmenting observations with historical information, and employing recurrent or transformer-based architectures. We conduct a systematic empirical analysis of these algorithms across different host network sizes. We find that this task greatly benefits from history aggregation. Converging three times faster than other approaches. Manual inspection of the learned policies by the algorithms reveals clear distinctions and provides insights that go beyond quantitative results.

To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning

Machine Learning (CS)

Teaches robots to learn better from hidden information.

3 Oct 2025 1

89%

Benchmarking Partial Observability in Reinforcement Learning with a Suite of Memory-Improvable Domains

Machine Learning (CS)

Helps robots learn better from incomplete information.

31 Jul 2025 1

88%

R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability

Machine Learning (CS)

Helps robots find hidden targets even when unsure.

21 Nov 2025 1

View PDF Login to Bookmark

Page Count

27 pages

Learning Robust Penetration-Testing Policies under Partial Observability: A systematic evaluation

Teaches computers to find computer weaknesses faster.

Technical Abstract

To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning

Benchmarking Partial Observability in Reinforcement Learning with a Suite of Memory-Improvable Domains

R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability