RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models
By: Quy-Anh Dang, Chris Ngo, Truong-Son Hy
Potential Business Impact:
Makes AI safer from bad instructions.
As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount. However, existing red teaming datasets suffer from inconsistent risk categorizations, limited domain coverage, and outdated evaluations, hindering systematic vulnerability assessments. To address these challenges, we introduce RedBench, a universal dataset aggregating 37 benchmark datasets from leading conferences and repositories, comprising 29,362 samples across attack and refusal prompts. RedBench employs a standardized taxonomy with 22 risk categories and 19 domains, enabling consistent and comprehensive evaluations of LLM vulnerabilities. We provide a detailed analysis of existing datasets, establish baselines for modern LLMs, and open-source the dataset and evaluation code. Our contributions facilitate robust comparisons, foster future research, and promote the development of secure and reliable LLMs for real-world deployment. Code: https://github.com/knoveleng/redeval
Similar Papers
Automated Red-Teaming Framework for Large Language Model Security Assessment: A Comprehensive Attack Generation and Detection System
Cryptography and Security
Finds hidden dangers in AI programs.
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Computation and Language
Finds and fixes problems in smart computer programs.
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
Cryptography and Security
Tests AI to find and break other AI.