Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
By: Abdulaziz Almuzairee , Rohan Patil , Dwait Bhatt and more
Potential Business Impact:
Robots learn faster and work with broken cameras.
Vision is well-known for its use in manipulation, especially using visual servoing. Due to the 3D nature of the world, using multiple camera views and merging them creates better representations for Q-learning and in turn, trains more sample efficient policies. Nevertheless, these multi-view policies are sensitive to failing cameras and can be burdensome to deploy. To mitigate these issues, we introduce a Merge And Disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while simultaneously disentangling views by augmenting multi-view feature inputs with single-view features. This produces robust policies and allows lightweight deployment. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3. For project website and code, see https://aalmuzairee.github.io/mad
Similar Papers
ManiVID-3D: Generalizable View-Invariant Reinforcement Learning for Robotic Manipulation via Disentangled 3D Representations
Robotics
Robots can do tasks even if camera moves.
Zero-Shot Visual Generalization in Robot Manipulation
Robotics
Robots learn to do tasks in new places.
Strategic Vantage Selection for Learning Viewpoint-Agnostic Manipulation Policies
Robotics
Teaches robots to grab things from any angle.