CycleVLA: Proactive Self-Correcting Vision-Language-Action Models via Subtask Backtracking and Minimum Bayes Risk Decoding
By: Chenyang Ma , Guangyu Yang , Kai Lu and more
Potential Business Impact:
Robots fix mistakes before they happen.
Current work on robot failure detection and correction typically operate in a post hoc manner, analyzing errors and applying corrections only after failures occur. This work introduces CycleVLA, a system that equips Vision-Language-Action models (VLAs) with proactive self-correction, the capability to anticipate incipient failures and recover before they fully manifest during execution. CycleVLA achieves this by integrating a progress-aware VLA that flags critical subtask transition points where failures most frequently occur, a VLM-based failure predictor and planner that triggers subtask backtracking upon predicted failure, and a test-time scaling strategy based on Minimum Bayes Risk (MBR) decoding to improve retry success after backtracking. Extensive experiments show that CycleVLA improves performance for both well-trained and under-trained VLAs, and that MBR serves as an effective zero-shot test-time scaling strategy for VLAs. Project Page: https://dannymcy.github.io/cyclevla/
Similar Papers
FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
Robotics
Robots learn to fix their own mistakes.
Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols
Robotics
Helps robots fix mistakes by showing them what to do.
Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols
Robotics
Teaches robots to fix their own mistakes.