Score: 1

RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Published: January 7, 2026 | arXiv ID: 2601.03699v1

By: Quy-Anh Dang, Chris Ngo, Truong-Son Hy

Potential Business Impact:

Makes AI safer from bad instructions.

Business Areas:

A/B Testing Data and Analytics

As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount. However, existing red teaming datasets suffer from inconsistent risk categorizations, limited domain coverage, and outdated evaluations, hindering systematic vulnerability assessments. To address these challenges, we introduce RedBench, a universal dataset aggregating 37 benchmark datasets from leading conferences and repositories, comprising 29,362 samples across attack and refusal prompts. RedBench employs a standardized taxonomy with 22 risk categories and 19 domains, enabling consistent and comprehensive evaluations of LLM vulnerabilities. We provide a detailed analysis of existing datasets, establish baselines for modern LLMs, and open-source the dataset and evaluation code. Our contributions facilitate robust comparisons, foster future research, and promote the development of secure and reliable LLMs for real-world deployment. Code: https://github.com/knoveleng/redeval

Automated Red-Teaming Framework for Large Language Model Security Assessment: A Comprehensive Attack Generation and Detection System

Cryptography and Security

Finds hidden dangers in AI programs.

21 Dec 2025 0

89%

Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models

Computation and Language

Finds and fixes problems in smart computer programs.

3 Mar 2025 0

88%

AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models

Cryptography and Security

Tests AI to find and break other AI.

17 Jun 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com github.com

Page Count

30 pages

RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Makes AI safer from bad instructions.

Technical Abstract

Automated Red-Teaming Framework for Large Language Model Security Assessment: A Comprehensive Attack Generation and Detection System

Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models

AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models