MetaWorld: Skill Transfer and Composition in a Hierarchical World Model for Grounding High-Level Instructions
By: Yutong Shen , Hangxu Liu , Kailin Pei and more
Potential Business Impact:
Robots learn to walk and grab things better.
Humanoid robot loco-manipulation remains constrained by the semantic-physical gap. Current methods face three limitations: Low sample efficiency in reinforcement learning, poor generalization in imitation learning, and physical inconsistency in VLMs. We propose MetaWorld, a hierarchical world model that integrates semantic planning and physical control via expert policy transfer. The framework decouples tasks into a VLM-driven semantic layer and a latent dynamics model operating in a compact state space. Our dynamic expert selection and motion prior fusion mechanism leverages a pre-trained multi-expert policy library as transferable knowledge, enabling efficient online adaptation via a two-stage framework. VLMs serve as semantic interfaces, mapping instructions to executable skills and bypassing symbol grounding. Experiments on Humanoid-Bench show MetaWorld outperforms world model-based RL in task completion and motion coherence. Our code will be found at https://anonymous.4open.science/r/metaworld-2BF4/
Similar Papers
Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills
Robotics
Robots learn complex tasks without human help.
Aligning Agentic World Models via Knowledgeable Experience Learning
Computation and Language
Teaches AI to follow real-world rules.
Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System
Robotics
Robots work together better using AI to move things.