Score: 0

Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models

Published: September 4, 2025 | arXiv ID: 2509.04063v1

By: Hongyin Zhang , Shiyuan Zhang , Junxi Jin and more

Potential Business Impact:

Robots learn to do tasks better by practicing.

Business Areas:

A/B Testing Data and Analytics

Vision-Language-Action (VLA) models based on flow matching have shown excellent performance in general-purpose robotic manipulation tasks. However, the action accuracy of these models on complex downstream tasks is unsatisfactory. One important reason is that these models rely solely on the post-training paradigm of imitation learning, which makes it difficult to have a deeper understanding of the distribution properties of data quality, which is exactly what Reinforcement Learning (RL) excels at. In this paper, we theoretically propose an offline RL post-training objective for VLA flow models and induce an efficient and feasible offline RL fine-tuning algorithm -- Adaptive Reinforced Flow Matching (ARFM). By introducing an adaptively adjusted scaling factor in the VLA flow model loss, we construct a principled bias-variance trade-off objective function to optimally control the impact of RL signal on flow loss. ARFM adaptively balances RL advantage preservation and flow loss gradient variance control, resulting in a more stable and efficient fine-tuning process. Extensive simulation and real-world experimental results show that ARFM exhibits excellent generalization, robustness, few-shot learning, and continuous learning performance.

RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models

Robotics

Makes robots work better even when things go wrong.

3 Nov 2025 1

90%

Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models

Machine Learning (CS)

Teaches robots to learn new tasks by watching.

11 Oct 2025 0

90%

$π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Machine Learning (CS)

Teaches robots to do more tasks faster.

29 Oct 2025 1

View PDF Login to Bookmark

Page Count

14 pages

Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models

Robots learn to do tasks better by practicing.

Technical Abstract

RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models

Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models

$π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models