BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context
By: Aditya Tomar, Nihar Ranjan Sahoo, Pushpak Bhattacharyya
Potential Business Impact:
Tests AI for unfairness in Indian languages.
Evaluating social biases in language models (LMs) is crucial for ensuring fairness and minimizing the reinforcement of harmful stereotypes in AI systems. Existing benchmarks, such as the Bias Benchmark for Question Answering (BBQ), primarily focus on Western contexts, limiting their applicability to the Indian context. To address this gap, we introduce BharatBBQ, a culturally adapted benchmark designed to assess biases in Hindi, English, Marathi, Bengali, Tamil, Telugu, Odia, and Assamese. BharatBBQ covers 13 social categories, including 3 intersectional groups, reflecting prevalent biases in the Indian sociocultural landscape. Our dataset contains 49,108 examples in one language that are expanded using translation and verification to 392,864 examples in eight different languages. We evaluate five multilingual LM families across zero and few-shot settings, analyzing their bias and stereotypical bias scores. Our findings highlight persistent biases across languages and social categories and often amplified biases in Indian languages compared to English, demonstrating the necessity of linguistically and culturally grounded benchmarks for bias evaluation.
Similar Papers
PakBBQ: A Culturally Adapted Bias Benchmark for QA
Computation and Language
Makes AI fairer for people speaking different languages.
PBBQ: A Persian Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models
Computation and Language
Helps computers understand Persian culture without bias.
VoiceBBQ: Investigating Effect of Content and Acoustics in Social Bias of Spoken Language Model
Computation and Language
Tests how AI voices show unfairness.