3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space
By: Sangjun Noh , Dongwoo Nam , Kangmin Kim and more
Potential Business Impact:
Robots learn to grab and move things better.
Learning robust visuomotor policies that generalize across diverse objects and interaction dynamics remains a central challenge in robotic manipulation. Most existing approaches rely on direct observation-to-action mappings or compress perceptual inputs into global or object-centric features, which often overlook localized motion cues critical for precise and contact-rich manipulation. We present 3D Flow Diffusion Policy (3D FDP), a novel framework that leverages scene-level 3D flow as a structured intermediate representation to capture fine-grained local motion cues. Our approach predicts the temporal trajectories of sampled query points and conditions action generation on these interaction-aware flows, implemented jointly within a unified diffusion architecture. This design grounds manipulation in localized dynamics while enabling the policy to reason about broader scene-level consequences of actions. Extensive experiments on the MetaWorld benchmark show that 3D FDP achieves state-of-the-art performance across 50 tasks, particularly excelling on medium and hard settings. Beyond simulation, we validate our method on eight real-robot tasks, where it consistently outperforms prior baselines in contact-rich and non-prehensile scenarios. These results highlight 3D flow as a powerful structural prior for learning generalizable visuomotor policies, supporting the development of more robust and versatile robotic manipulation. Robot demonstrations, additional results, and code can be found at https://sites.google.com/view/3dfdp/home.
Similar Papers
H$^3$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
Robotics
Teaches robots to grab and move things better.
3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
Robotics
Robots learn to move objects by watching how they move.
VO-DP: Semantic-Geometric Adaptive Diffusion Policy for Vision-Only Robotic Manipulation
Robotics
Robots learn to grab things using only their eyes.