Score: 1

Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study

Published: October 6, 2025 | arXiv ID: 2510.04641v2

By: Ayan Majumdar , Feihao Chen , Jinghui Li and more

BigTech Affiliations: Huawei

Potential Business Impact:

Finds unfairness in AI's words.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large-scale web-scraped text corpora used to train general-purpose AI models often contain harmful demographic-targeted social biases, creating a regulatory need for data auditing and developing scalable bias-detection methods. Although prior work has investigated biases in text datasets and related detection methods, these studies remain narrow in scope. They typically focus on a single content type (e.g., hate speech), cover limited demographic axes, overlook biases affecting multiple demographics simultaneously, and analyze limited techniques. Consequently, practitioners lack a holistic understanding of the strengths and limitations of recent large language models (LLMs) for automated bias detection. In this study, we present a comprehensive evaluation framework aimed at English texts to assess the ability of LLMs in detecting demographic-targeted social biases. To align with regulatory requirements, we frame bias detection as a multi-label task using a demographic-focused taxonomy. We then conduct a systematic evaluation with models across scales and techniques, including prompting, in-context learning, and fine-tuning. Using twelve datasets spanning diverse content types and demographics, our study demonstrates the promise of fine-tuned smaller models for scalable detection. However, our analyses also expose persistent gaps across demographic axes and multi-demographic targeted biases, underscoring the need for more effective and scalable auditing frameworks.

Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study

Computation and Language

Finds unfairness in computer language training.

6 Oct 2025 1

93%

Who Gets Which Message? Auditing Demographic Bias in LLM-Generated Targeted Text

Computation and Language

AI messages show unfair stereotypes based on age/gender.

23 Jan 2026 0

92%

Demographic Biases and Gaps in the Perception of Sexism in Large Language Models

Computation and Language

Finds sexism, but not everyone's view.

25 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

18 pages

Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study

Finds unfairness in AI's words.

Technical Abstract

Evaluating LLMs for Demographic-Targeted Social Bias Detection: A Comprehensive Benchmark Study

Who Gets Which Message? Auditing Demographic Bias in LLM-Generated Targeted Text

Demographic Biases and Gaps in the Perception of Sexism in Large Language Models