Score: 0

CURVE: A Benchmark for Cultural and Multilingual Long Video Reasoning

Published: January 15, 2026 | arXiv ID: 2601.10649v1

By: Darshan Singh , Arsha Nagrani , Kawshik Manikantan and more

Recent advancements in video models have shown tremendous progress, particularly in long video understanding. However, current benchmarks predominantly feature western-centric data and English as the dominant language, introducing significant biases in evaluation. To address this, we introduce CURVE (Cultural Understanding and Reasoning in Video Evaluation), a challenging benchmark for multicultural and multilingual video reasoning. CURVE comprises high-quality, entirely human-generated annotations from diverse, region-specific cultural videos across 18 global locales. Unlike prior work that relies on automatic translations, CURVE provides complex questions, answers, and multi-step reasoning steps, all crafted in native languages. Making progress on CURVE requires a deeply situated understanding of visual cultural context. Furthermore, we leverage CURVE's reasoning traces to construct evidence-based graphs and propose a novel iterative strategy using these graphs to identify fine-grained errors in reasoning. Our evaluations reveal that SoTA Video-LLMs struggle significantly, performing substantially below human-level accuracy, with errors primarily stemming from the visual perception of cultural elements. CURVE will be publicly available under https://github.com/google-deepmind/neptune?tab=readme-ov-file\#minerva-cultural

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

CV and Pattern Recognition

Tests how well AI understands videos.

20 Nov 2025 1

87%

VideoNorms: Benchmarking Cultural Awareness of Video Language Models

CV and Pattern Recognition

Teaches AI to understand different cultures in videos.

9 Oct 2025 1

87%

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

CV and Pattern Recognition

Tests AI's ability to understand science videos.

9 Oct 2025 1

View PDF Login to Bookmark

CURVE: A Benchmark for Cultural and Multilingual Long Video Reasoning

Technical Abstract

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

VideoNorms: Benchmarking Cultural Awareness of Video Language Models

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models