Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning
By: Rickmer Krohn , Vignesh Prasad , Gabriele Tiboni and more
Potential Business Impact:
Robots learn to grab things better with many senses.
Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to learn in such multisensory settings, especially amidst sensory noise and dynamic changes. We propose MultiSensory Dynamic Pretraining (MSDP), a novel framework for learning expressive multisensory representations tailored for task-oriented policy learning. MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion. For downstream policy learning, we introduce a novel asymmetric architecture, where a cross-attention mechanism allows the critic to extract dynamic, task-specific features from the frozen embeddings, while the actor receives a stable pooled representation to guide its actions. Our method demonstrates accelerated learning and robust performance under diverse perturbations, including sensor noise, and changes in object dynamics. Evaluations in multiple challenging, contact-rich robot manipulation tasks in simulation and the real world showcase the effectiveness of MSDP. Our approach exhibits strong robustness to perturbations and achieves high success rates on the real robot with as few as 6,000 online interactions, offering a simple yet powerful solution for complex multisensory robotic control.
Similar Papers
Contact-Safe Reinforcement Learning with ProMP Reparameterization and Energy Awareness
Robotics
Robots learn to move safely and smoothly.
ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning
Robotics
Robots learn to touch and move objects precisely.
Self-Predictive Dynamics for Generalization of Vision-based Reinforcement Learning
CV and Pattern Recognition
Teaches robots to learn from messy pictures.