Score: 1

Mapillary Vistas Validation for Fine-Grained Traffic Signs: A Benchmark Revealing Vision-Language Model Limitations

Published: August 4, 2025 | arXiv ID: 2508.02047v1

By: Sparsh Garg, Abhishek Aich

Potential Business Impact:

Teaches self-driving cars to see signs better.

Obtaining high-quality fine-grained annotations for traffic signs is critical for accurate and safe decision-making in autonomous driving. Widely used datasets, such as Mapillary, often provide only coarse-grained labels - without distinguishing semantically important types such as stop signs or speed limit signs. To this end, we present a new validation set for traffic signs derived from the Mapillary dataset called Mapillary Vistas Validation for Traffic Signs (MVV), where we decompose composite traffic signs into granular, semantically meaningful categories. The dataset includes pixel-level instance masks and has been manually annotated by expert annotators to ensure label fidelity. Further, we benchmark several state-of-the-art VLMs against the self-supervised DINOv2 model on this dataset and show that DINOv2 consistently outperforms all VLM baselines-not only on traffic sign recognition, but also on heavily represented categories like vehicles and humans. Our analysis reveals significant limitations in current vision-language models for fine-grained visual understanding and establishes DINOv2 as a strong baseline for dense semantic matching in autonomous driving scenarios. This dataset and evaluation framework pave the way for more reliable, interpretable, and scalable perception systems. Code and data are available at: https://github.com/nec-labs-ma/relabeling

Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views

CV and Pattern Recognition

Helps find exact locations from photos.

3 Jun 2025 0

88%

FineVision: Open Data Is All You Need

CV and Pattern Recognition

Makes AI understand pictures and words better.

20 Oct 2025 2

87%

VISAT: Benchmarking Adversarial and Distribution Shift Robustness in Traffic Sign Recognition with Visual Attributes

Cryptography and Security

Makes self-driving cars see signs better.

29 Oct 2025 1

View PDF Login to Bookmark

Page Count

8 pages

Mapillary Vistas Validation for Fine-Grained Traffic Signs: A Benchmark Revealing Vision-Language Model Limitations

Teaches self-driving cars to see signs better.

Technical Abstract

Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views

FineVision: Open Data Is All You Need

VISAT: Benchmarking Adversarial and Distribution Shift Robustness in Traffic Sign Recognition with Visual Attributes