Score: 0

Towards Safer Mobile Agents: Scalable Generation and Evaluation of Diverse Scenarios for VLMs

Published: January 13, 2026 | arXiv ID: 2601.08470v1

By: Takara Taniguchi, Kuniaki Saito, Atsushi Hashimoto

Potential Business Impact:

Makes self-driving cars safer by testing them.

Business Areas:

Autonomous Vehicles Transportation

Vision Language Models (VLMs) are increasingly deployed in autonomous vehicles and mobile systems, making it crucial to evaluate their ability to support safer decision-making in complex environments. However, existing benchmarks inadequately cover diverse hazardous situations, especially anomalous scenarios with spatio-temporal dynamics. While image editing models are a promising means to synthesize such hazards, it remains challenging to generate well-formulated scenarios that include moving, intrusive, and distant objects frequently observed in the real world. To address this gap, we introduce \textbf{HazardForge}, a scalable pipeline that leverages image editing models to generate these scenarios with layout decision algorithms, and validation modules. Using HazardForge, we construct \textbf{MovSafeBench}, a multiple-choice question (MCQ) benchmark comprising 7,254 images and corresponding QA pairs across 13 object categories, covering both normal and anomalous objects. Experiments using MovSafeBench show that VLM performance degrades notably under conditions including anomalous objects, with the largest drop in scenarios requiring nuanced motion understanding.

VLM as Strategist: Adaptive Generation of Safety-critical Testing Scenarios via Guided Diffusion

Robotics

Creates realistic, tricky driving tests for self-driving cars.

2 Dec 2025 0

91%

Toward Automatic Safe Driving Instruction: A Large-Scale Vision Language Model Approach

CV and Pattern Recognition

Helps cars watch drivers and roads for safety.

28 Nov 2025 1

90%

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

CV and Pattern Recognition

Tests if car AI truly sees or just guesses.

7 Jan 2025 0

View PDF Login to Bookmark

Page Count

15 pages

Towards Safer Mobile Agents: Scalable Generation and Evaluation of Diverse Scenarios for VLMs

Makes self-driving cars safer by testing them.

Technical Abstract

VLM as Strategist: Adaptive Generation of Safety-critical Testing Scenarios via Guided Diffusion

Toward Automatic Safe Driving Instruction: A Large-Scale Vision Language Model Approach

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives