Score: 1

The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks

Published: June 3, 2025 | arXiv ID: 2506.03404v1

By: Walter Mayor , Johan Obando-Ceron , Aaron Courville and more

BigTech Affiliations: Google

Potential Business Impact:

Makes robots learn faster by collecting more data.

Business Areas:

A/B Testing Data and Analytics

The use of parallel actors for data collection has been an effective technique used in reinforcement learning (RL) algorithms. The manner in which data is collected in these algorithms, controlled via the number of parallel environments and the rollout length, induces a form of bias-variance trade-off; the number of training passes over the collected data, on the other hand, must strike a balance between sample efficiency and overfitting. We conduct an empirical analysis of these trade-offs on PPO, one of the most popular RL algorithms that uses parallel actors, and establish connections to network plasticity and, more generally, optimization stability. We examine its impact on network architectures, as well as the hyper-parameter sensitivity when scaling data. Our analyses indicate that larger dataset sizes can increase final performance across a variety of settings, and that scaling parallel environments is more effective than increasing rollout lengths. These findings highlight the critical role of data collection strategies in improving agent performance.

Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story

Machine Learning (CS)

Makes AI learn faster by having different helpers.

2 May 2025 0

87%

Staggered Environment Resets Improve Massively Parallel On-Policy Reinforcement Learning

Machine Learning (CS)

Makes robots learn faster and better.

26 Nov 2025 1

86%

Offline vs. Online Learning in Model-based RL: Lessons for Data Collection Strategies

Machine Learning (CS)

Helps robots learn better by mixing old and new experiences.

6 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

22 pages

The Impact of On-Policy Parallelized Data Collection on Deep Reinforcement Learning Networks

Makes robots learn faster by collecting more data.

Technical Abstract

Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story

Staggered Environment Resets Improve Massively Parallel On-Policy Reinforcement Learning

Offline vs. Online Learning in Model-based RL: Lessons for Data Collection Strategies