Score: 0

$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Published: April 22, 2025 | arXiv ID: 2504.16054v1

By: Physical Intelligence , Kevin Black , Noah Brown and more

Potential Business Impact:

Robots learn to clean new homes by watching and listening.

Business Areas:

Robotics Hardware, Science and Engineering, Software

In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $\pi_{0.5}$, a new model based on $\pi_{0}$ that uses co-training on heterogeneous tasks to enable broad generalization. $\pi_{0.5}$\ uses data from multiple robots, high-level semantic prediction, web data, and other sources to enable broadly generalizable real-world robotic manipulation. Our system uses a combination of co-training and hybrid multi-modal examples that combine image observations, language commands, object detections, semantic subtask prediction, and low-level actions. Our experiments show that this kind of knowledge transfer is essential for effective generalization, and we demonstrate for the first time that an end-to-end learning-enabled robotic system can perform long-horizon and dexterous manipulation skills, such as cleaning a kitchen or bedroom, in entirely new homes.

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Robotics

Robots learn tasks faster with fake robot videos.

22 Oct 2025 0

90%

Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation

Robotics

Robots learn to do tasks better by watching and listening.

14 Nov 2025 0

90%

From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models

Robotics

Robots learn to do more tasks with better instructions.

11 Jun 2025 1

View PDF Login to Bookmark

Page Count

19 pages

$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Robots learn to clean new homes by watching and listening.

Technical Abstract

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation

From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models