Object-Centric Representations Improve Policy Generalization in Robot Manipulation
By: Alexandre Chapin , Bruno Machado , Emmanuel Dellandrea and more
Potential Business Impact:
Robots learn to grab things better by seeing objects.
Visual representations are central to the learning and generalization capabilities of robotic manipulation policies. While existing methods rely on global or dense features, such representations often entangle task-relevant and irrelevant scene information, limiting robustness under distribution shifts. In this work, we investigate object-centric representations (OCR) as a structured alternative that segments visual input into a finished set of entities, introducing inductive biases that align more naturally with manipulation tasks. We benchmark a range of visual encoders-object-centric, global and dense methods-across a suite of simulated and real-world manipulation tasks ranging from simple to complex, and evaluate their generalization under diverse visual conditions including changes in lighting, texture, and the presence of distractors. Our findings reveal that OCR-based policies outperform dense and global representations in generalization settings, even without task-specific pretraining. These insights suggest that OCR is a promising direction for designing visual systems that generalize effectively in dynamic, real-world robotic environments.
Similar Papers
Disentangled Object-Centric Image Representation for Robotic Manipulation
CV and Pattern Recognition
Robots learn to grab things better, even with many objects.
Deep Reinforcement Learning via Object-Centric Attention
Machine Learning (CS)
Helps game robots learn new levels faster.
Video Spatial Reasoning with Object-Centric 3D Rollout
CV and Pattern Recognition
Teaches computers to understand 3D object locations in videos.