Score: 0

VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation

Published: September 24, 2025 | arXiv ID: 2509.20322v1

By: Shaofeng Yin , Yanjie Ze , Hong-Xing Yu and more

Potential Business Impact:

Robots learn to move and grab like humans.

Business Areas:

Motion Capture Media and Entertainment, Video

Humanoid loco-manipulation in unstructured environments demands tight integration of egocentric perception and whole-body control. However, existing approaches either depend on external motion capture systems or fail to generalize across diverse tasks. We introduce VisualMimic, a visual sim-to-real framework that unifies egocentric vision with hierarchical whole-body control for humanoid robots. VisualMimic combines a task-agnostic low-level keypoint tracker -- trained from human motion data via a teacher-student scheme -- with a task-specific high-level policy that generates keypoint commands from visual and proprioceptive input. To ensure stable training, we inject noise into the low-level policy and clip high-level actions using human motion statistics. VisualMimic enables zero-shot transfer of visuomotor policies trained in simulation to real humanoid robots, accomplishing a wide range of loco-manipulation tasks such as box lifting, pushing, football dribbling, and kicking. Beyond controlled laboratory settings, our policies also generalize robustly to outdoor environments. Videos are available at: https://visualmimic.github.io .

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning

Robotics

Robots learn to move and grab things precisely.

6 Oct 2025 0

93%

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning

Robotics

Robots learn to move and grab things precisely.

6 Oct 2025 0

91%

RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion

Robotics

Robots learn to walk by watching videos.

29 Dec 2025 1

View PDF Login to Bookmark

Page Count

10 pages

VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation

Robots learn to move and grab like humans.

Technical Abstract

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning

RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion