Automatic Classifiers Underdetect Emotions Expressed by Men
By: Ivan Smirnov , Segun T. Aroyehun , Paul Plener and more
Potential Business Impact:
Computers misunderstand men's feelings more than women's.
The widespread adoption of automatic sentiment and emotion classifiers makes it important to ensure that these tools perform reliably across different populations. Yet their reliability is typically assessed using benchmarks that rely on third-party annotators rather than the individuals experiencing the emotions themselves, potentially concealing systematic biases. In this paper, we use a unique, large-scale dataset of more than one million self-annotated posts and a pre-registered research design to investigate gender biases in emotion detection across 414 combinations of models and emotion-related classes. We find that across different types of automatic classifiers and various underlying emotions, error rates are consistently higher for texts authored by men compared to those authored by women. We quantify how this bias could affect results in downstream applications and show that current machine learning tools, including large language models, should be applied with caution when the gender composition of a sample is not known or variable. Our findings demonstrate that sentiment analysis is not yet a solved problem, especially in ensuring equitable model behaviour across demographic groups.
Similar Papers
Gender Bias in Emotion Recognition by Large Language Models
Computation and Language
Makes AI understand feelings without gender bias.
Identifying Bias in Machine-generated Text Detection
Computation and Language
Detectors unfairly flag some students' writing as fake.
Automated Evaluation of Gender Bias Across 13 Large Multimodal Models
CV and Pattern Recognition
Finds AI makes unfair pictures of jobs.