Evaluation of Cultural Competence of Vision-Language Models
By: Srishti Yadav , Lauren Tilton , Maria Antoniak and more
Potential Business Impact:
Teaches computers to understand cultural meanings in pictures.
Modern vision-language models (VLMs) often fail at cultural competency evaluations and benchmarks. Given the diversity of applications built upon VLMs, there is renewed interest in understanding how they encode cultural nuances. While individual aspects of this problem have been studied, we still lack a comprehensive framework for systematically identifying and annotating the nuanced cultural dimensions present in images for VLMs. This position paper argues that foundational methodologies from visual culture studies (cultural studies, semiotics, and visual studies) are necessary for cultural analysis of images. Building upon this review, we propose a set of five frameworks, corresponding to cultural dimensions, that must be considered for a more complete analysis of the cultural competencies of VLMs.
Similar Papers
Toward Socially Aware Vision-Language Models: Evaluating Cultural Competence Through Multimodal Story Generation
Computation and Language
AI stories change to match different cultures.
Cultural Awareness in Vision-Language Models: A Cross-Country Exploration
Computers and Society
Finds how computers see people and places unfairly.
Uncovering Cultural Representation Disparities in Vision-Language Models
CV and Pattern Recognition
Finds AI's unfair views on different countries.