Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study
By: Ayan Majumdar , Feihao Chen , Jinghui Li and more
Potential Business Impact:
Finds unfairness in AI's words.
Large-scale web-scraped text corpora used to train general-purpose AI models often contain harmful demographic-targeted social biases, creating a regulatory need for data auditing and developing scalable bias-detection methods. Although prior work has investigated biases in text datasets and related detection methods, these studies remain narrow in scope. They typically focus on a single content type (e.g., hate speech), cover limited demographic axes, overlook biases affecting multiple demographics simultaneously, and analyze limited techniques. Consequently, practitioners lack a holistic understanding of the strengths and limitations of recent large language models (LLMs) for automated bias detection. In this study, we present a comprehensive evaluation framework aimed at English texts to assess the ability of LLMs in detecting demographic-targeted social biases. To align with regulatory requirements, we frame bias detection as a multi-label task using a demographic-focused taxonomy. We then conduct a systematic evaluation with models across scales and techniques, including prompting, in-context learning, and fine-tuning. Using twelve datasets spanning diverse content types and demographics, our study demonstrates the promise of fine-tuned smaller models for scalable detection. However, our analyses also expose persistent gaps across demographic axes and multi-demographic targeted biases, underscoring the need for more effective and scalable auditing frameworks.
Similar Papers
Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study
Computation and Language
Finds unfairness in computer language training.
Demographic Biases and Gaps in the Perception of Sexism in Large Language Models
Computation and Language
Finds sexism, but not everyone's view.
Fine-Grained Bias Detection in LLM: Enhancing detection mechanisms for nuanced biases
Computation and Language
Finds hidden unfairness in AI language.