Score: 2

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs

Published: August 24, 2025 | arXiv ID: 2508.17334v2

By: Somraj Gautam , Abhirama Subramanyam Penamakuri , Abhishek Bhandari and more

Potential Business Impact:

Tests computers reading cricket scores in different languages.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We introduce MMCRICBENCH-3K, a benchmark for Visual Question Answering (VQA) on cricket scorecards, designed to evaluate large vision-language models (LVLMs) on complex numerical and cross-lingual reasoning over semi-structured tabular images. MMCRICBENCH-3K comprises 1,463 synthetically generated scorecard images from ODI, T20, and Test formats, accompanied by 1,500 English QA pairs. It includes two subsets: MMCRICBENCH-E-1.5K, featuring English scorecards, and MMCRICBENCH-H-1.5K, containing visually similar Hindi scorecards, with all questions and answers kept in English to enable controlled cross-script evaluation. The task demands reasoning over structured numerical data, multi-image context, and implicit domain knowledge. Empirical results show that even state-of-the-art LVLMs, such as GPT-4o and Qwen2.5VL, struggle on the English subset despite it being their primary training language and exhibit a further drop in performance on the Hindi subset. This reveals key limitations in structure-aware visual text understanding, numerical reasoning, and cross-lingual generalization. The dataset is publicly available via Hugging Face at https://huggingface.co/datasets/DIALab/MMCricBench, to promote LVLM research in this direction.

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs

CV and Pattern Recognition

Helps computers understand cricket scores better.

24 Aug 2025 2

90%

IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs

CV and Pattern Recognition

Tests AI on Indian languages and culture.

6 Nov 2025 1

90%

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

CV and Pattern Recognition

Tests AI's ability to explain and fix its mistakes.

10 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇮🇳 India

Repos / Data Links

huggingface.co huggingface.co

Page Count

17 pages

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs

Tests computers reading cricket scores in different languages.

Technical Abstract

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs

IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models