Score: 0

Vision-Language Models display a strong gender bias

Published: August 15, 2025 | arXiv ID: 2508.11262v1

By: Aiswarya Konavoor , Raj Abhijit Dandekar , Rajat Dandekar and more

Potential Business Impact:

Finds unfair gender ideas in AI that sees and reads.

Vision-language models (VLM) align images and text in a shared representation space that is useful for retrieval and zero-shot transfer. Yet, this alignment can encode and amplify social stereotypes in subtle ways that are not obvious from standard accuracy metrics. In this study, we test whether the contrastive vision-language encoder exhibits gender-linked associations when it places embeddings of face images near embeddings of short phrases that describe occupations and activities. We assemble a dataset of 220 face photographs split by perceived binary gender and a set of 150 unique statements distributed across six categories covering emotional labor, cognitive labor, domestic labor, technical labor, professional roles, and physical labor. We compute unit-norm image embeddings for every face and unit-norm text embeddings for every statement, then define a statement-level association score as the difference between the mean cosine similarity to the male set and the mean cosine similarity to the female set, where positive values indicate stronger association with the male set and negative values indicate stronger association with the female set. We attach bootstrap confidence intervals by resampling images within each gender group, aggregate by category with a separate bootstrap over statements, and run a label-swap null model that estimates the level of mean absolute association we would expect if no gender structure were present. The outcome is a statement-wise and category-wise map of gender associations in a contrastive vision-language space, accompanied by uncertainty, simple sanity checks, and a robust gender bias evaluation framework.

Bias in the Picture: Benchmarking VLMs with Social-Cue News Images and LLM-as-Judge Assessment

CV and Pattern Recognition

Finds and fixes unfairness in AI that sees and reads.

24 Sep 2025 1

91%

Image Recognition with Vision and Language Embeddings of VLMs

CV and Pattern Recognition

Helps computers understand pictures better with words or just sight.

11 Sep 2025 1

91%

Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models

CV and Pattern Recognition

Makes AI tell more uniform stories about women.

7 Mar 2025 0

View PDF Login to Bookmark

Page Count

6 pages

Vision-Language Models display a strong gender bias

Finds unfair gender ideas in AI that sees and reads.

Technical Abstract

Bias in the Picture: Benchmarking VLMs with Social-Cue News Images and LLM-as-Judge Assessment

Image Recognition with Vision and Language Embeddings of VLMs

Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models