Score: 1

IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

Published: December 10, 2025 | arXiv ID: 2512.09663v1

By: Tao Zhang , Yuyang Hong , Yang Xia and more

Potential Business Impact:

Helps computers see and understand heat pictures.

Business Areas:

Visual Search Internet Services

Recent advances in multimodal large language models (MLLMs) have led to impressive progress across various benchmarks. However, their capability in understanding infrared images remains unexplored. To address this gap, we introduce IF-Bench, the first high-quality benchmark designed for evaluating multimodal understanding of infrared images. IF-Bench consists of 499 images sourced from 23 infrared datasets and 680 carefully curated visual question-answer pairs, covering 10 essential dimensions of image understanding. Based on this benchmark, we systematically evaluate over 40 open-source and closed-source MLLMs, employing cyclic evaluation, bilingual assessment, and hybrid judgment strategies to enhance the reliability of the results. Our analysis reveals how model scale, architecture, and inference paradigms affect infrared image comprehension, providing valuable insights for this area. Furthermore, we propose a training-free generative visual prompting (GenViP) method, which leverages advanced image editing models to translate infrared images into semantically and spatially aligned RGB counterparts, thereby mitigating domain distribution shifts. Extensive experiments demonstrate that our method consistently yields significant performance improvements across a wide range of MLLMs. The benchmark and code are available at https://github.com/casiatao/IF-Bench.

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

CV and Pattern Recognition

Tests how well AI understands videos with pictures.

21 Apr 2025 2

89%

Inference-Time Scaling of Diffusion Models for Infrared Data Generation

CV and Pattern Recognition

Makes AI create better "night vision" pictures.

10 Nov 2025 1

89%

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning

CV and Pattern Recognition

Helps AI "think" with pictures, not just look.

14 Oct 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

17 pages

IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

Helps computers see and understand heat pictures.

Technical Abstract

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

Inference-Time Scaling of Diffusion Models for Infrared Data Generation

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning