Automotive-ENV: Benchmarking Multimodal Agents in Vehicle Interface Systems
By: Junfeng Yan , Biao Wu , Meng Fang and more
Potential Business Impact:
Helps car screens understand your needs safely.
Multimodal agents have demonstrated strong performance in general GUI interactions, but their application in automotive systems has been largely unexplored. In-vehicle GUIs present distinct challenges: drivers' limited attention, strict safety requirements, and complex location-based interaction patterns. To address these challenges, we introduce Automotive-ENV, the first high-fidelity benchmark and interaction environment tailored for vehicle GUIs. This platform defines 185 parameterized tasks spanning explicit control, implicit intent understanding, and safety-aware tasks, and provides structured multimodal observations with precise programmatic checks for reproducible evaluation. Building on this benchmark, we propose ASURADA, a geo-aware multimodal agent that integrates GPS-informed context to dynamically adjust actions based on location, environmental conditions, and regional driving norms. Experiments show that geo-aware information significantly improves success on safety-aware tasks, highlighting the importance of location-based context in automotive environments. We will release Automotive-ENV, complete with all tasks and benchmarking tools, to further the development of safe and adaptive in-vehicle agents.
Similar Papers
Quo-Vadis Multi-Agent Automotive Research? Insights from a Participatory Workshop and Questionnaire
Human-Computer Interaction
Helps self-driving cars safely share roads.
Multimodal Framework for Explainable Autonomous Driving: Integrating Video, Sensor, and Textual Data for Enhanced Decision-Making and Transparency
Multimedia
Helps self-driving cars explain their actions.
AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning
Artificial Intelligence
Teaches AI to learn in many different worlds.