Score: 2

Vision Language Models are Biased

Published: May 29, 2025 | arXiv ID: 2505.23941v1

By: An Vo , Khai-Nguyen Nguyen , Mohammad Reza Taesiri and more

Potential Business Impact:

AI struggles to count things it "knows."

Business Areas:
Image Recognition Data and Analytics, Software

Large language models (LLMs) memorize a vast amount of prior knowledge from the Internet that help them on downstream tasks but also may notoriously sway their outputs towards wrong or biased answers. In this work, we test how the knowledge about popular subjects hurt the accuracy of vision language models (VLMs) on standard, objective visual tasks of counting and identification. We find that state-of-the-art VLMs are strongly biased (e.g, unable to recognize a fourth stripe has been added to a 3-stripe Adidas logo) scoring an average of 17.05% accuracy in counting (e.g., counting stripes in an Adidas-like logo) across 7 diverse domains from animals, logos, chess, board games, optical illusions, to patterned grids. Insert text (e.g., "Adidas") describing the subject name into the counterfactual image further decreases VLM accuracy. The biases in VLMs are so strong that instructing them to double-check their results or rely exclusively on image details to answer improves counting accuracy by only +2 points, on average. Our work presents an interesting failure mode in VLMs and an automated framework for testing VLM biases. Code and data are available at: vlmsarebiased.github.io.

Country of Origin
πŸ‡ΊπŸ‡Έ πŸ‡°πŸ‡· Korea, Republic of, United States

Page Count
52 pages

Category
Computer Science:
Machine Learning (CS)