Translating Flow to Policy via Hindsight Online Imitation
By: Yitian Zheng , Zhangchen Ye , Weijun Dong and more
Recent advances in hierarchical robot systems leverage a high-level planner to propose task plans and a low-level policy to generate robot actions. This design allows training the planner on action-free or even non-robot data sources (e.g., videos), providing transferable high-level guidance. Nevertheless, grounding these high-level plans into executable actions remains challenging, especially with the limited availability of high-quality robot data. To this end, we propose to improve the low-level policy through online interactions. Specifically, our approach collects online rollouts, retrospectively annotates the corresponding high-level goals from achieved outcomes, and aggregates these hindsight-relabeled experiences to update a goal-conditioned imitation policy. Our method, Hindsight Flow-conditioned Online Imitation (HinFlow), instantiates this idea with 2D point flows as the high-level planner. Across diverse manipulation tasks in both simulation and physical world, our method achieves more than $2\times$ performance improvement over the base policy, significantly outperforming the existing methods. Moreover, our framework enables policy acquisition from planners trained on cross-embodiment video data, demonstrating its potential for scalable and transferable robot learning.
Similar Papers
Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
Machine Learning (CS)
Teaches robots to learn from past actions better.
Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation
Robotics
Robots learn to fix mistakes instantly in real world.
ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training
Robotics
Robots learn to do many tasks by watching.