Inter-environmental world modeling for continuous and compositional dynamics
By: Kohei Hayashi, Masanori Koyama, Julian Jorge Andrade Guerreiro
Potential Business Impact:
Teaches robots to learn new actions from videos.
Various world model frameworks are being developed today based on autoregressive frameworks that rely on discrete representations of actions and observations, and these frameworks are succeeding in constructing interactive generative models for the target environment of interest. Meanwhile, humans demonstrate remarkable generalization abilities to combine experiences in multiple environments to mentally simulate and learn to control agents in diverse environments. Inspired by this human capability, we introduce World modeling through Lie Action (WLA), an unsupervised framework that learns continuous latent action representations to simulate across environments. WLA learns a control interface with high controllability and predictive ability by simultaneously modeling the dynamics of multiple environments using Lie group theory and object-centric autoencoder. On synthetic benchmark and real-world datasets, we demonstrate that WLA can be trained using only video frames and, with minimal or no action labels, can quickly adapt to new environments with novel action sets.
Similar Papers
Latent Action World Models for Control with Unlabeled Trajectories
Machine Learning (CS)
Teaches robots to learn from watching and doing.
Latent Action Pretraining Through World Modeling
Robotics
Teaches robots to do tasks from watching videos.
Co-Evolving Latent Action World Models
Machine Learning (CS)
Makes AI learn and control worlds better.