Score: 0

Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models

Published: April 14, 2025 | arXiv ID: 2504.09979v1

By: Teppei Suzuki, Keisuke Ozawa

Potential Business Impact:

Tests smart AI faster and more fairly.

Business Areas:

Image Recognition Data and Analytics, Software

We propose an efficient evaluation protocol for large vision-language models (VLMs). Given their broad knowledge and reasoning capabilities, multiple benchmarks are needed for comprehensive assessment, making evaluation computationally expensive. To improve efficiency, we construct a subset that yields results comparable to full benchmark evaluations. Our benchmark classification experiments reveal that no single benchmark fully covers all challenges. We then introduce a subset construction method using farthest point sampling (FPS). Our experiments show that FPS-based benchmarks maintain a strong correlation (> 0.96) with full evaluations while using only ~1\% of the data. Additionally, applying FPS to an existing benchmark improves correlation with overall evaluation results, suggesting its potential to reduce unintended dataset biases.

EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models

Computation and Language

Makes big AI models run much faster and cheaper.

31 May 2025 1

89%

Frame Sampling Strategies Matter: A Benchmark for small vision language models

CV and Pattern Recognition

Makes AI better at understanding videos.

18 Sep 2025 2

88%

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation

CV and Pattern Recognition

Tests how well computers see tiny details.

21 Apr 2025 0

View PDF Login to Bookmark

Page Count

15 pages

Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models

Tests smart AI faster and more fairly.

Technical Abstract

EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models

Frame Sampling Strategies Matter: A Benchmark for small vision language models

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation