Eva-VLA: Evaluating Vision-Language-Action Models' Robustness Under Real-World Physical Variations
By: Hanqing Liu , Jiahuan Long , Junqi Wu and more
Potential Business Impact:
Makes robots better at handling unexpected real-world problems.
Vision-Language-Action (VLA) models have emerged as promising solutions for robotic manipulation, yet their robustness to real-world physical variations remains critically underexplored. To bridge this gap, we propose Eva-VLA, the first unified framework that systematically evaluates the robustness of VLA models by transforming discrete physical variations into continuous optimization problems. However, comprehensively assessing VLA robustness presents two key challenges: (1) how to systematically characterize diverse physical variations encountered in real-world deployments while maintaining evaluation reproducibility, and (2) how to discover worst-case scenarios without prohibitive real-world data collection costs efficiently. To address the first challenge, we decompose real-world variations into three critical domains: object 3D transformations that affect spatial reasoning, illumination variations that challenge visual perception, and adversarial patches that disrupt scene understanding. For the second challenge, we introduce a continuous black-box optimization framework that transforms discrete physical variations into parameter optimization, enabling systematic exploration of worst-case scenarios. Extensive experiments on state-of-the-art OpenVLA models across multiple benchmarks reveal alarming vulnerabilities: all variation types trigger failure rates exceeding 60%, with object transformations causing up to 97.8% failure in long-horizon tasks. Our findings expose critical gaps between controlled laboratory success and unpredictable deployment readiness, while the Eva-VLA framework provides a practical pathway for hardening VLA-based robotic manipulation models against real-world deployment challenges.
Similar Papers
EvoVLA: Self-Evolving Vision-Language-Action Model
CV and Pattern Recognition
Robots learn to do long, tricky jobs better.
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Robotics
Robots fail when things change slightly.
Phantom Menace: Exploring and Enhancing the Robustness of VLA Models against Physical Sensor Attacks
Robotics
Protects robots from being tricked by bad signals.