IndiCASA: A Dataset and Bias Evaluation Framework in LLMs Using Contrastive Embedding Similarity in the Indian Context
By: Santhosh G S , Akshay Govind S , Gokul S Krishnan and more
Potential Business Impact:
Finds and fixes unfairness in AI language.
Large Language Models (LLMs) have gained significant traction across critical domains owing to their impressive contextual understanding and generative capabilities. However, their increasing deployment in high stakes applications necessitates rigorous evaluation of embedded biases, particularly in culturally diverse contexts like India where existing embedding-based bias assessment methods often fall short in capturing nuanced stereotypes. We propose an evaluation framework based on a encoder trained using contrastive learning that captures fine-grained bias through embedding similarity. We also introduce a novel dataset - IndiCASA (IndiBias-based Contextually Aligned Stereotypes and Anti-stereotypes) comprising 2,575 human-validated sentences spanning five demographic axes: caste, gender, religion, disability, and socioeconomic status. Our evaluation of multiple open-weight LLMs reveals that all models exhibit some degree of stereotypical bias, with disability related biases being notably persistent, and religion bias generally lower likely due to global debiasing efforts demonstrating the need for fairer model development.
Similar Papers
Addressing Stereotypes in Large Language Models: A Critical Examination and Mitigation
Computation and Language
Fixes AI's unfairness and improves its understanding.
A Comprehensive Study of Implicit and Explicit Biases in Large Language Models
Machine Learning (CS)
Finds and fixes unfairness in AI writing.
Breaking the Benchmark: Revealing LLM Bias via Minimal Contextual Augmentation
Computation and Language
Makes AI less likely to be unfair or biased.