VacuumVLA: Boosting VLA Capabilities via a Unified Suction and Gripping Tool for Complex Robotic Manipulation
By: Hui Zhou , Siyuan Huang , Minxing Li and more
Potential Business Impact:
Robot hand can now grip, stick, and do more tasks.
Vision Language Action models have significantly advanced general purpose robotic manipulation by harnessing large scale pretrained vision and language representations. Among existing approaches, a majority of current VLA systems employ parallel two finger grippers as their default end effectors. However, such grippers face inherent limitations in handling certain real world tasks such as wiping glass surfaces or opening drawers without handles due to insufficient contact area or lack of adhesion. To overcome these challenges, we present a low cost, integrated hardware design that combines a mechanical two finger gripper with a vacuum suction unit, enabling dual mode manipulation within a single end effector. Our system supports flexible switching or synergistic use of both modalities, expanding the range of feasible tasks. We validate the efficiency and practicality of our design within two state of the art VLA frameworks: DexVLA and Pi0. Experimental results demonstrate that with the proposed hybrid end effector, robots can successfully perform multiple complex tasks that are infeasible for conventional two finger grippers alone. All hardware designs and controlling systems will be released.
Similar Papers
ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation
Robotics
Robots learn to build things by watching goals.
Information-Theoretic Graph Fusion with Vision-Language-Action Model for Policy Reasoning and Dual Robotic Control
Robotics
Robots learn to build things by watching videos.
EvoVLA: Self-Evolving Vision-Language-Action Model
CV and Pattern Recognition
Robots learn to do long, tricky jobs better.