Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories
By: Rushuai Yang , Zhiyuan Feng , Tianxiang Zhang and more
Potential Business Impact:
Teaches robots many ways to do tasks.
Scaling vision-language-action (VLA) model pre-training requires large volumes of diverse, high-quality manipulation trajectories. Most current data is obtained via human teleoperation, which is expensive and difficult to scale. Reinforcement learning (RL) methods learn useful skills through autonomous exploration, making them a viable approach for generating data. However, standard RL training collapses to a narrow execution pattern, limiting its utility for large-scale pre-training. We propose Discover, Lea rn and Reinforce (DLR), an information-theoretic pattern discovery framework that generates multiple distinct, high-success behavioral patterns for VLA pretraining. Empirically, DLR generates a markedly more diverse trajectory corpus on LIBERO. Specifically, it learns multiple distinct, high-success strategies for the same task where standard RL discovers only one, and hence it covers substantially broader regions of the state-action space. When adapted to unseen downstream task suites, VLA models pretrained on our diverse RL data surpass counterparts trained on equal-sized standard RL datasets. Moreover, DLR exhibits positive data-scaling behavior that single-pattern RL lacks. These results position multi-pattern RL as a practical, scalable data engine for embodied foundation models.
Similar Papers
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Robotics
Robots learn to do new tasks better with less data.
Reinforcing Action Policies by Prophesying
Robotics
Teaches robots to learn new tasks faster.
Beyond Human Demonstrations: Diffusion-Based Reinforcement Learning to Generate Data for VLA Training
Robotics
Teaches robots to do many tasks better.