ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations
By: Qiyuan Zeng , Chengmeng Li , Jude St. John and more
Potential Business Impact:
Teaches robots to do tasks by watching humans.
We present ActiveUMI, a framework for a data collection system that transfers in-the-wild human demonstrations to robots capable of complex bimanual manipulation. ActiveUMI couples a portable VR teleoperation kit with sensorized controllers that mirror the robot's end-effectors, bridging human-robot kinematics via precise pose alignment. To ensure mobility and data quality, we introduce several key techniques, including immersive 3D model rendering, a self-contained wearable computer, and efficient calibration methods. ActiveUMI's defining feature is its capture of active, egocentric perception. By recording an operator's deliberate head movements via a head-mounted display, our system learns the crucial link between visual attention and manipulation. We evaluate ActiveUMI on six challenging bimanual tasks. Policies trained exclusively on ActiveUMI data achieve an average success rate of 70\% on in-distribution tasks and demonstrate strong generalization, retaining a 56\% success rate when tested on novel objects and in new environments. Our results demonstrate that portable data collection systems, when coupled with learned active perception, provide an effective and scalable pathway toward creating generalizable and highly capable real-world robot policies.
Similar Papers
MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning
Robotics
Robots learn better from more camera views.
exUMI: Extensible Robot Teaching System with Action-aware Task-agnostic Tactile Representation
Robotics
Robots learn to feel and grip objects better.
UMIGen: A Unified Framework for Egocentric Point Cloud Generation and Cross-Embodiment Robotic Imitation Learning
Robotics
Robots learn new tasks faster with less special gear.