Score: 0

RoboHiMan: A Hierarchical Evaluation Paradigm for Compositional Generalization in Long-Horizon Manipulation

Published: October 15, 2025 | arXiv ID: 2510.13149v1

By: Yangtao Chen , Zixuan Chen , Nga Teng Chan and more

Potential Business Impact:

Helps robots learn and do new jobs.

Business Areas:

Robotics Hardware, Science and Engineering, Software

Enabling robots to flexibly schedule and compose learned skills for novel long-horizon manipulation under diverse perturbations remains a core challenge. Early explorations with end-to-end VLA models show limited success, as these models struggle to generalize beyond the training distribution. Hierarchical approaches, where high-level planners generate subgoals for low-level policies, bring certain improvements but still suffer under complex perturbations, revealing limited capability in skill composition. However, existing benchmarks primarily emphasize task completion in long-horizon settings, offering little insight into compositional generalization, robustness, and the interplay between planning and execution. To systematically investigate these gaps, we propose RoboHiMan, a hierarchical evaluation paradigm for compositional generalization in long-horizon manipulation. RoboHiMan introduces HiMan-Bench, a benchmark of atomic and compositional tasks under diverse perturbations, supported by a multi-level training dataset for analyzing progressive data scaling, and proposes three evaluation paradigms (vanilla, decoupled, coupled) that probe the necessity of skill composition and reveal bottlenecks in hierarchical architectures. Experiments highlight clear capability gaps across representative models and architectures, pointing to directions for advancing models better suited to real-world long-horizon manipulation tasks. Videos and open-source code can be found on our project website: https://chenyt31.github.io/robo-himan.github.io/.

HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data

Robotics

Robots learn to do new tasks by watching.

13 Oct 2025 2

89%

Heterogeneous Multi-Expert Reinforcement Learning for Long-Horizon Multi-Goal Tasks in Autonomous Forklifts

Robotics

Forklifts learn to pick and move things better.

12 Jan 2026 0

89%

MetaWorld: Skill Transfer and Composition in a Hierarchical World Model for Grounding High-Level Instructions

Robotics

Robots learn to walk and grab things better.

24 Jan 2026 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

21 pages

RoboHiMan: A Hierarchical Evaluation Paradigm for Compositional Generalization in Long-Horizon Manipulation

Helps robots learn and do new jobs.

Technical Abstract

HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data

Heterogeneous Multi-Expert Reinforcement Learning for Long-Horizon Multi-Goal Tasks in Autonomous Forklifts

MetaWorld: Skill Transfer and Composition in a Hierarchical World Model for Grounding High-Level Instructions