Score: 0

Adaptive Data Collection for Latin-American Community-sourced Evaluation of Stereotypes (LACES)

Published: October 28, 2025 | arXiv ID: 2510.24958v1

By: Guido Ivetta , Pietro Palombini , Sofía Martinelli and more

Potential Business Impact:

Finds and fixes harmful stereotypes in computer language.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The evaluation of societal biases in NLP models is critically hindered by a glaring geo-cultural gap, as existing benchmarks are overwhelmingly English-centric and focused on U.S. demographics. This leaves regions such as Latin America severely underserved, making it impossible to adequately assess or mitigate the perpetuation of harmful regional stereotypes by language technologies. To address this gap, we introduce a new, large-scale dataset of stereotypes developed through targeted community partnerships within Latin America. Furthermore, we present a novel dynamic data collection methodology that uniquely integrates the sourcing of new stereotype entries and the validation of existing data within a single, unified workflow. This combined approach results in a resource with significantly broader coverage and higher regional nuance than static collection methods. We believe that this new method could be applicable in gathering sociocultural knowledge of other kinds, and that this dataset provides a crucial new resource enabling robust stereotype evaluation and significantly addressing the geo-cultural deficit in fairness resources for Latin America.

Country of Origin
🇦🇷 Argentina

Page Count
13 pages

Category
Computer Science:
Computers and Society