Score: 0

In-N-On: Scaling Egocentric Manipulation with in-the-wild and on-task Data

Published: November 19, 2025 | arXiv ID: 2511.15704v1

By: Xiongyi Cai , Ri-Zhao Qiu , Geng Chen and more

Potential Business Impact:

Teaches robots to follow instructions like humans.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Egocentric videos are a valuable and scalable data source to learn manipulation policies. However, due to significant data heterogeneity, most existing approaches utilize human data for simple pre-training, which does not unlock its full potential. This paper first provides a scalable recipe for collecting and using egocentric data by categorizing human data into two categories: in-the-wild and on-task alongside with systematic analysis on how to use the data. We first curate a dataset, PHSD, which contains over 1,000 hours of diverse in-the-wild egocentric data and over 20 hours of on-task data directly aligned to the target manipulation tasks. This enables learning a large egocentric language-conditioned flow matching policy, Human0. With domain adaptation techniques, Human0 minimizes the gap between humans and humanoids. Empirically, we show Human0 achieves several novel properties from scaling human data, including language following of instructions from only human data, few-shot learning, and improved robustness using on-task data. Project website: https://xiongyicai.github.io/In-N-On/

EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

Robotics

Robots learn to copy human actions better.

31 Oct 2025 0

89%

EMMA: Scaling Mobile Manipulation via Egocentric Human Data

Robotics

Teaches robots to do tasks using human moves.

4 Sep 2025 0

89%

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions

CV and Pattern Recognition

AI learns to help people by watching and listening.

6 Aug 2025 0

View PDF Login to Bookmark

Page Count

14 pages

In-N-On: Scaling Egocentric Manipulation with in-the-wild and on-task Data

Teaches robots to follow instructions like humans.

Technical Abstract

EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

EMMA: Scaling Mobile Manipulation via Egocentric Human Data

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions