Score: 1

End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand VLA Policy for Efficient Data Collection

Published: October 31, 2025 | arXiv ID: 2511.00139v1

By: Yu Cui , Yujian Zhang , Lina Tao and more

BigTech Affiliations: ByteDance

Potential Business Impact:

Robots learn to grab things better with less human help.

Business Areas:

Autonomous Vehicles Transportation

Achieving human-like dexterous manipulation remains a major challenge for general-purpose robots. While Vision-Language-Action (VLA) models show potential in learning skills from demonstrations, their scalability is limited by scarce high-quality training data. Existing data collection methods face inherent constraints: manual teleoperation overloads human operators, while automated planning often produces unnatural motions. We propose a Shared Autonomy framework that divides control between macro and micro motions. A human operator guides the robot's arm pose through intuitive VR teleoperation, while an autonomous DexGrasp-VLA policy handles fine-grained hand control using real-time tactile and visual feedback. This division significantly reduces cognitive load and enables efficient collection of high-quality coordinated arm-hand demonstrations. Using this data, we train an end-to-end VLA policy enhanced with our novel Arm-Hand Feature Enhancement module, which captures both distinct and shared representations of macro and micro movements for more natural coordination. Our Corrective Teleoperation system enables continuous policy improvement through human-in-the-loop failure recovery. Experiments demonstrate that our framework generates high-quality data with minimal manpower and achieves a 90% success rate across diverse objects, including unseen instances. Comprehensive evaluations validate the system's effectiveness in developing dexterous manipulation capabilities.

Information-Theoretic Graph Fusion with Vision-Language-Action Model for Policy Reasoning and Dual Robotic Control

Robotics

Robots learn to build things by watching videos.

7 Aug 2025 0

91%

Graph-Fused Vision-Language-Action for Policy Reasoning in Multi-Arm Robotic Manipulation

Robotics

Robots learn to build things by watching videos.

9 Sep 2025 1

91%

Robotic Assistant: Completing Collaborative Tasks with Dexterous Vision-Language-Action Models

Robotics

Robots learn to help people with simple words.

29 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

35 pages

End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand VLA Policy for Efficient Data Collection

Robots learn to grab things better with less human help.

Technical Abstract

Information-Theoretic Graph Fusion with Vision-Language-Action Model for Policy Reasoning and Dual Robotic Control

Graph-Fused Vision-Language-Action for Policy Reasoning in Multi-Arm Robotic Manipulation

Robotic Assistant: Completing Collaborative Tasks with Dexterous Vision-Language-Action Models