Score: 0

Prompt Optimization and Evaluation for LLM Automated Red Teaming

Published: July 29, 2025 | arXiv ID: 2507.22133v1

By: Michael Freenor , Lauren Alvarez , Milton Leal and more

Potential Business Impact:

Finds weak spots in computer programs faster.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Applications that use Large Language Models (LLMs) are becoming widespread, making the identification of system vulnerabilities increasingly important. Automated Red Teaming accelerates this effort by using an LLM to generate and execute attacks against target systems. Attack generators are evaluated using the Attack Success Rate (ASR) the sample mean calculated over the judgment of success for each attack. In this paper, we introduce a method for optimizing attack generator prompts that applies ASR to individual attacks. By repeating each attack multiple times against a randomly seeded target, we measure an attack's discoverability the expectation of the individual attack success. This approach reveals exploitable patterns that inform prompt optimization, ultimately enabling more robust evaluation and refinement of generators.

Automated Red-Teaming Framework for Large Language Model Security Assessment: A Comprehensive Attack Generation and Detection System

Cryptography and Security

Finds hidden dangers in AI programs.

21 Dec 2025 0

91%

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

Machine Learning (CS)

Protects AI from bad instructions and tricks.

16 Oct 2025 1

90%

Automatic LLM Red Teaming

Machine Learning (CS)

Trains AI to find weaknesses in other AI.

6 Aug 2025 2

View PDF Login to Bookmark

Page Count

9 pages

Prompt Optimization and Evaluation for LLM Automated Red Teaming

Finds weak spots in computer programs faster.

Technical Abstract

Automated Red-Teaming Framework for Large Language Model Security Assessment: A Comprehensive Attack Generation and Detection System

Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers

Automatic LLM Red Teaming