Score: 0

Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries

Published: December 1, 2025 | arXiv ID: 2512.01419v1

By: Tushar Pranav , Eshan Pandey , Austria Lyka Diane Bala and more

Potential Business Impact:

Helps computers understand cultures worldwide better.

Business Areas:

Image Recognition Data and Analytics, Software

Vision-Language Models (VLMs) excel in multimodal tasks but often exhibit Western-centric biases, limiting their effectiveness in culturally diverse regions like Southeast Asia (SEA). To address this, we introduce RICE-VL, a novel benchmark evaluating VLM cultural understanding across 11 ASEAN countries. RICE-VL includes over 28,000 human-curated Visual Question Answering (VQA) samples -- covering True or False, Fill-in-the-Blank, and open-ended formats -- and 1,000 image-bounding box pairs for Visual Grounding, annotated by culturally informed experts across 14 sub-ground categories. We propose SEA-LAVE, an extension of the LAVE metric, assessing textual accuracy, cultural alignment, and country identification. Evaluations of six open- and closed-source VLMs reveal significant performance gaps in low-resource countries and abstract cultural domains. The Visual Grounding task tests models' ability to localize culturally significant elements in complex scenes, probing spatial and contextual accuracy. RICE-VL exposes limitations in VLMs' cultural comprehension and highlights the need for inclusive model development to better serve diverse global populations.

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

CV and Pattern Recognition

Makes AI understand Southeast Asian cultures better.

10 Mar 2025 1

90%

BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models

CV and Pattern Recognition

Tests if computers understand different cultures.

13 Oct 2025 1

90%

IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs

CV and Pattern Recognition

Tests AI on Indian languages and culture.

6 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇸🇬 Singapore

Page Count

14 pages

Rice-VL: Evaluating Vision-Language Models for Cultural Understanding Across ASEAN Countries

Helps computers understand cultures worldwide better.

Technical Abstract

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models

IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs