Enhancing Robot Safety via MLLM-Based Semantic Interpretation of Failure Data
By: Aryaman Gupta, Yusuf Umut Ciftci, Somil Bansal
Potential Business Impact:
Helps robots learn from mistakes automatically.
As robotic systems become increasingly integrated into real-world environments, ranging from autonomous vehicles to household assistants, they inevitably encounter diverse and unstructured scenarios that lead to failures. While such failures pose safety and reliability challenges, they also provide rich perceptual data for improving future performance. However, manually analyzing large-scale failure datasets is impractical. In this work, we present a method for automatically organizing large-scale robotic failure data into semantically meaningful clusters, enabling scalable learning from failure without human supervision. Our approach leverages the reasoning capabilities of Multimodal Large Language Models (MLLMs), trained on internet-scale data, to infer high-level failure causes from raw perceptual trajectories and discover interpretable structure within uncurated failure logs. These semantic clusters reveal latent patterns and hypothesized causes of failure, enabling scalable learning from experience. We demonstrate that the discovered failure modes can guide targeted data collection for policy refinement, accelerating iterative improvement in agent policies and overall safety. Additionally, we show that these semantic clusters can be employed for online failure detection, offering a lightweight yet powerful safeguard for real-time adaptation. We demonstrate that this framework enhances robot learning and robustness by transforming real-world failures into actionable and interpretable signals for adaptation.
Similar Papers
Incorporating Failure of Machine Learning in Dynamic Probabilistic Safety Assurance
Artificial Intelligence
Makes self-driving cars safer by checking their "thinking."
FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models
Robotics
Robots learn to fix their own mistakes.
Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends
CV and Pattern Recognition
Makes cars see and understand everything around them.