DemoBot: Efficient Learning of Bimanual Manipulation with Dexterous Hands From Third-Person Human Videos
By: Yucheng Xu , Xiaofeng Mao , Elle Miller and more
Potential Business Impact:
Robots learn to build things by watching videos.
This work presents DemoBot, a learning framework that enables a dual-arm, multi-finger robotic system to acquire complex manipulation skills from a single unannotated RGB-D video demonstration. The method extracts structured motion trajectories of both hands and objects from raw video data. These trajectories serve as motion priors for a novel reinforcement learning (RL) pipeline that learns to refine them through contact-rich interactions, thereby eliminating the need to learn from scratch. To address the challenge of learning long-horizon manipulation skills, we introduce: (1) Temporal-segment based RL to enforce temporal alignment of the current state with demonstrations; (2) Success-Gated Reset strategy to balance the refinement of readily acquired skills and the exploration of subsequent task stages; and (3) Event-Driven Reward curriculum with adaptive thresholding to guide the RL learning of high-precision manipulation. The novel video processing and RL framework successfully achieved long-horizon synchronous and asynchronous bimanual assembly tasks, offering a scalable approach for direct skill acquisition from human videos.
Similar Papers
DexMan: Learning Bimanual Dexterous Manipulation from Human and Generated Videos
Robotics
Robots learn to do tasks by watching videos.
One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation
Robotics
Creates many robot hand movements from one example.
Learning to Transfer Human Hand Skills for Robot Manipulations
Robotics
Teaches robots to copy human hand movements.