Score: 0

Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation

Published: August 8, 2025 | arXiv ID: 2508.06426v1

By: Youguang Xing , Xu Luo , Junlin Xie and more

Potential Business Impact:

Robots learn better by seeing more varied examples.

Generalist robot policies trained on large-scale datasets such as Open X-Embodiment (OXE) demonstrate strong performance across a wide range of tasks. However, they often struggle to generalize beyond the distribution of their training data. In this paper, we investigate the underlying cause of this limited generalization capability. We identify shortcut learning -- the reliance on task-irrelevant features -- as a key impediment to generalization. Through comprehensive theoretical and empirical analysis, we uncover two primary contributors to shortcut learning: (1) limited diversity within individual sub-datasets, and (2) significant distributional disparities across sub-datasets, leading to dataset fragmentation. These issues arise from the inherent structure of large-scale datasets like OXE, which are typically composed of multiple sub-datasets collected independently across varied environments and embodiments. Our findings provide critical insights into dataset collection strategies that can reduce shortcut learning and enhance the generalization ability of generalist robot policies. Moreover, in scenarios where acquiring new large-scale data is impractical, we demonstrate that carefully selected robotic data augmentation strategies can effectively reduce shortcut learning in existing offline datasets, thereby improving generalization capabilities of generalist robot policies, e.g., $\pi_0$, in both simulation and real-world environments. More information at https://lucky-light-sun.github.io/proj/shortcut-learning-in-grps/.

OXE-AugE: A Large-Scale Robot Augmentation of OXE for Scaling Cross-Embodiment Policy Learning

Robotics

Makes robots learn to do more tasks.

15 Dec 2025 2

87%

What Matters in Learning from Large-Scale Datasets for Robot Manipulation

Robotics

Teaches robots to learn better from watching.

16 Jun 2025 0

86%

A Study on Enhancing the Generalization Ability of Visuomotor Policies via Data Augmentation

Robotics

Teaches robots to do tasks in new places.

13 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

29 pages

Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation

Robots learn better by seeing more varied examples.

Technical Abstract

OXE-AugE: A Large-Scale Robot Augmentation of OXE for Scaling Cross-Embodiment Policy Learning

What Matters in Learning from Large-Scale Datasets for Robot Manipulation

A Study on Enhancing the Generalization Ability of Visuomotor Policies via Data Augmentation