Score: 0

SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection

Published: December 12, 2025 | arXiv ID: 2512.11215v1

By: Tianye Qi, Weihao Li, Nick Barnes

Wildfire smoke is transparent, amorphous, and often visually confounded with clouds, making early-stage detection particularly challenging. In this work, we introduce a benchmark, called SmokeBench, to evaluate the ability of multimodal large language models (MLLMs) to recognize and localize wildfire smoke in images. The benchmark consists of four tasks: (1) smoke classification, (2) tile-based smoke localization, (3) grid-based smoke localization, and (4) smoke detection. We evaluate several MLLMs, including Idefics2, Qwen2.5-VL, InternVL3, Unified-IO 2, Grounding DINO, GPT-4o, and Gemini-2.5 Pro. Our results show that while some models can classify the presence of smoke when it covers a large area, all models struggle with accurate localization, especially in the early stages. Further analysis reveals that smoke volume is strongly correlated with model performance, whereas contrast plays a comparatively minor role. These findings highlight critical limitations of current MLLMs for safety-critical wildfire monitoring and underscore the need for methods that improve early-stage smoke localization.

Benchmarking Multimodal Large Language Models for Face Recognition

CV and Pattern Recognition

Tests how computers recognize faces better.

16 Oct 2025 1

88%

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

CV and Pattern Recognition

Helps computers understand who speaks in videos.

1 Dec 2025 1

88%

Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs

CV and Pattern Recognition

AI can find sickness in X-rays.

22 Sep 2025 1

View PDF Login to Bookmark

SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection

Technical Abstract

Benchmarking Multimodal Large Language Models for Face Recognition

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs