Score: 0

Iterative Refinement of Flow Policies in Probability Space for Online Reinforcement Learning

Published: October 17, 2025 | arXiv ID: 2510.15388v1

By: Mingyang Sun , Pengxiang Ding , Weinan Zhang and more

Potential Business Impact:

Teaches robots to learn new skills faster.

Business Areas:

Innovation Management Professional Services

While behavior cloning with flow/diffusion policies excels at learning complex skills from demonstrations, it remains vulnerable to distributional shift, and standard RL methods struggle to fine-tune these models due to their iterative inference process and the limitations of existing workarounds. In this work, we introduce the Stepwise Flow Policy (SWFP) framework, founded on the key insight that discretizing the flow matching inference process via a fixed-step Euler scheme inherently aligns it with the variational Jordan-Kinderlehrer-Otto (JKO) principle from optimal transport. SWFP decomposes the global flow into a sequence of small, incremental transformations between proximate distributions. Each step corresponds to a JKO update, regularizing policy changes to stay near the previous iterate and ensuring stable online adaptation with entropic regularization. This decomposition yields an efficient algorithm that fine-tunes pre-trained flows via a cascade of small flow blocks, offering significant advantages: simpler/faster training of sub-models, reduced computational/memory costs, and provable stability grounded in Wasserstein trust regions. Comprehensive experiments demonstrate SWFP's enhanced stability, efficiency, and superior adaptation performance across diverse robotic control benchmarks.

Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning

Machine Learning (CS)

Teaches robots to learn from past actions better.

3 Dec 2025 1

88%

Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models

Machine Learning (CS)

Teaches robots to learn new tasks by watching.

11 Oct 2025 0

87%

Flow-Based Policy for Online Reinforcement Learning

Machine Learning (CS)

Teaches robots to learn new skills faster.

15 Jun 2025 1

View PDF Login to Bookmark

Page Count

15 pages

Iterative Refinement of Flow Policies in Probability Space for Online Reinforcement Learning

Teaches robots to learn new skills faster.

Technical Abstract

Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning

Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models

Flow-Based Policy for Online Reinforcement Learning