Score: 1

Benchmarking Visual Language Models on Standardized Visualization Literacy Tests

Published: March 20, 2025 | arXiv ID: 2503.16632v1

By: Saugat Pandey, Alvitta Ottley

Potential Business Impact:

Helps computers understand charts, but they still get tricked.

Business Areas:

Visual Search Internet Services

The increasing integration of Visual Language Models (VLMs) into visualization systems demands a comprehensive understanding of their visual interpretation capabilities and constraints. While existing research has examined individual models, systematic comparisons of VLMs' visualization literacy remain unexplored. We bridge this gap through a rigorous, first-of-its-kind evaluation of four leading VLMs (GPT-4, Claude, Gemini, and Llama) using standardized assessments: the Visualization Literacy Assessment Test (VLAT) and Critical Thinking Assessment for Literacy in Visualizations (CALVI). Our methodology uniquely combines randomized trials with structured prompting techniques to control for order effects and response variability - a critical consideration overlooked in many VLM evaluations. Our analysis reveals that while specific models demonstrate competence in basic chart interpretation (Claude achieving 67.9% accuracy on VLAT), all models exhibit substantial difficulties in identifying misleading visualization elements (maximum 30.0\% accuracy on CALVI). We uncover distinct performance patterns: strong capabilities in interpreting conventional charts like line charts (76-96% accuracy) and detecting hierarchical structures (80-100% accuracy), but consistent difficulties with data-dense visualizations involving multiple encodings (bubble charts: 18.6-61.4%) and anomaly detection (25-30% accuracy). Significantly, we observe distinct uncertainty management behavior across models, with Gemini displaying heightened caution (22.5% question omission) compared to others (7-8%). These findings provide crucial insights for the visualization community by establishing reliable VLM evaluation benchmarks, identifying areas where current models fall short, and highlighting the need for targeted improvements in VLM architectures for visualization tasks.

Probing the Visualization Literacy of Vision Language Models: the Good, the Bad, and the Ugly

Human-Computer Interaction

Shows how AI understands charts by seeing.

7 Apr 2025 0

91%

Charts-of-Thought: Enhancing LLM Visualization Literacy Through Structured Data Extraction

Human-Computer Interaction

Helps computers understand charts better than people.

6 Aug 2025 1

91%

Visual Language Models show widespread visual deficits on neuropsychological tests

CV and Pattern Recognition

Computers see things like humans, but miss basic details.

15 Apr 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

12 pages

Benchmarking Visual Language Models on Standardized Visualization Literacy Tests

Helps computers understand charts, but they still get tricked.

Technical Abstract

Probing the Visualization Literacy of Vision Language Models: the Good, the Bad, and the Ugly

Charts-of-Thought: Enhancing LLM Visualization Literacy Through Structured Data Extraction

Visual Language Models show widespread visual deficits on neuropsychological tests