Score: 0

ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training

Published: September 1, 2025 | arXiv ID: 2509.01819v1

By: Ge Yan , Jiyue Zhu , Yuquan Deng and more

Potential Business Impact:

Robots learn to do many tasks by watching.

Business Areas:

Industrial Automation Manufacturing, Science and Engineering

This paper introduces ManiFlow, a visuomotor imitation learning policy for general robot manipulation that generates precise, high-dimensional actions conditioned on diverse visual, language and proprioceptive inputs. We leverage flow matching with consistency training to enable high-quality dexterous action generation in just 1-2 inference steps. To handle diverse input modalities efficiently, we propose DiT-X, a diffusion transformer architecture with adaptive cross-attention and AdaLN-Zero conditioning that enables fine-grained feature interactions between action tokens and multi-modal observations. ManiFlow demonstrates consistent improvements across diverse simulation benchmarks and nearly doubles success rates on real-world tasks across single-arm, bimanual, and humanoid robot setups with increasing dexterity. The extensive evaluation further demonstrates the strong robustness and generalizability of ManiFlow to novel objects and background changes, and highlights its strong scaling capability with larger-scale datasets. Our website: maniflow-policy.github.io.

3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space

Robotics

Robots learn to grab and move things better.

23 Sep 2025 1

90%

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

Robotics

Robots learn to move objects by watching how they move.

6 Jun 2025 1

89%

ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow

Robotics

Robots learn to do tasks by watching videos.

2 May 2025 2

View PDF Login to Bookmark

Page Count

26 pages

ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training

Robots learn to do many tasks by watching.

Technical Abstract

3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow