Score: 0

FBI: Learning Dexterous In-hand Manipulation with Dynamic Visuotactile Shortcut Policy

Published: August 20, 2025 | arXiv ID: 2508.14441v1

By: Yijin Chen , Wenqiang Xu , Zhenjun Yu and more

Potential Business Impact:

Robots learn to grasp and move objects better.

Business Areas:
Robotics Hardware, Science and Engineering, Software

Dexterous in-hand manipulation is a long-standing challenge in robotics due to complex contact dynamics and partial observability. While humans synergize vision and touch for such tasks, robotic approaches often prioritize one modality, therefore limiting adaptability. This paper introduces Flow Before Imitation (FBI), a visuotactile imitation learning framework that dynamically fuses tactile interactions with visual observations through motion dynamics. Unlike prior static fusion methods, FBI establishes a causal link between tactile signals and object motion via a dynamics-aware latent model. FBI employs a transformer-based interaction module to fuse flow-derived tactile features with visual inputs, training a one-step diffusion policy for real-time execution. Extensive experiments demonstrate that the proposed method outperforms the baseline methods in both simulation and the real world on two customized in-hand manipulation tasks and three standard dexterous manipulation tasks. Code, models, and more results are available in the website https://sites.google.com/view/dex-fbi.

Country of Origin
🇨🇳 China

Page Count
9 pages

Category
Computer Science:
Robotics