LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning
By: Yiyang Shao , Xiaoyu Huang , Bike Zhang and more
Potential Business Impact:
Robots understand words, move bodies like people.
General-purpose humanoid robots are expected to interact intuitively with humans, enabling seamless integration into daily life. Natural language provides the most accessible medium for this purpose. However, translating language into humanoid whole-body motion remains a significant challenge, primarily due to the gap between linguistic understanding and physical actions. In this work, we present an end-to-end, language-directed policy for real-world humanoid whole-body control. Our approach combines reinforcement learning with policy distillation, allowing a single neural network to interpret language commands and execute corresponding physical actions directly. To enhance motion diversity and compositionality, we incorporate a Conditional Variational Autoencoder (CVAE) structure. The resulting policy achieves agile and versatile whole-body behaviors conditioned on language inputs, with smooth transitions between various motions, enabling adaptation to linguistic variations and the emergence of novel motions. We validate the efficacy and generalizability of our method through extensive simulations and real-world experiments, demonstrating robust whole-body control. Please see our website at LangWBC.github.io for more information.
Similar Papers
LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction
Robotics
Robots learn to do many new tasks by watching.
Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary
Robotics
Robots understand and do what you say.
Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration
Robotics
Robots learn to move and act like people.