Temporally Coherent Imitation Learning via Latent Action Flow Matching for Robotic Manipulation
By: Wu Songwei , Jiang Zhiduo , Xie Guanghu and more
Potential Business Impact:
Robots learn to move smoothly and finish tasks.
Learning long-horizon robotic manipulation requires jointly achieving expressive behavior modeling, real-time inference, and stable execution, which remains challenging for existing generative policies. Diffusion-based approaches provide strong modeling capacity but typically incur high inference latency, while flow matching enables fast one-step generation yet often leads to unstable execution when applied directly in the raw action space. We propose LG-Flow Policy, a trajectory-level imitation learning framework that performs flow matching in a continuous latent action space. By encoding action sequences into temporally regularized latent trajectories and learning an explicit latent-space flow, the proposed approach decouples global motion structure from low-level control noise, resulting in smooth and reliable long-horizon execution. LG-Flow Policy further incorporates geometry-aware point cloud conditioning and execution-time multimodal modulation, with visual cues evaluated as a representative modality in real-world settings. Experimental results in simulation and on physical robot platforms demonstrate that LG-Flow Policy achieves near single-step inference, substantially improves trajectory smoothness and task success over flow-based baselines operating in the raw action space, and remains significantly more efficient than diffusion-based policies.
Similar Papers
Flow Policy Gradients for Robot Control
Robotics
Teaches robots to move and learn better.
DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models
Robotics
Teaches robots to do harder jobs better.
Diffusion Trajectory-guided Policy for Long-horizon Robot Manipulation
Robotics
Teaches robots to do long tasks better.