Learning Interactive World Model for Object-Centric Reinforcement Learning
By: Fan Feng, Phillip Lippe, Sara Magliacane
Potential Business Impact:
Teaches robots to learn by watching objects interact.
Agents that understand objects and their interactions can learn policies that are more robust and transferable. However, most object-centric RL methods factor state by individual objects while leaving interactions implicit. We introduce the Factored Interactive Object-Centric World Model (FIOC-WM), a unified framework that learns structured representations of both objects and their interactions within a world model. FIOC-WM captures environment dynamics with disentangled and modular representations of object interactions, improving sample efficiency and generalization for policy learning. Concretely, FIOC-WM first learns object-centric latents and an interaction structure directly from pixels, leveraging pre-trained vision encoders. The learned world model then decomposes tasks into composable interaction primitives, and a hierarchical policy is trained on top: a high level selects the type and order of interactions, while a low level executes them. On simulated robotic and embodied-AI benchmarks, FIOC-WM improves policy-learning sample efficiency and generalization over world-model baselines, indicating that explicit, modular interaction learning is crucial for robust control.
Similar Papers
Object-Centric World Models for Causality-Aware Reinforcement Learning
Machine Learning (CS)
Teaches robots to understand and learn from objects.
Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction
Artificial Intelligence
Lets computers learn faster by trying things.
Object-Centric World Model for Language-Guided Manipulation
Artificial Intelligence
Helps robots understand and plan actions with words.