FBI: Learning Dexterous In-hand Manipulation with Dynamic Visuotactile Shortcut Policy
By: Yijin Chen , Wenqiang Xu , Zhenjun Yu and more
Potential Business Impact:
Robots learn to grasp and move objects better.
Dexterous in-hand manipulation is a long-standing challenge in robotics due to complex contact dynamics and partial observability. While humans synergize vision and touch for such tasks, robotic approaches often prioritize one modality, therefore limiting adaptability. This paper introduces Flow Before Imitation (FBI), a visuotactile imitation learning framework that dynamically fuses tactile interactions with visual observations through motion dynamics. Unlike prior static fusion methods, FBI establishes a causal link between tactile signals and object motion via a dynamics-aware latent model. FBI employs a transformer-based interaction module to fuse flow-derived tactile features with visual inputs, training a one-step diffusion policy for real-time execution. Extensive experiments demonstrate that the proposed method outperforms the baseline methods in both simulation and the real world on two customized in-hand manipulation tasks and three standard dexterous manipulation tasks. Code, models, and more results are available in the website https://sites.google.com/view/dex-fbi.
Similar Papers
Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion
Robotics
Teaches robots to twist caps with their hands.
KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation
Robotics
Teaches robots to feel and grip like humans.
Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment
Robotics
Robots copy human hand moves from videos.