EARL: Efficient Agentic Reinforcement Learning Systems for Large Language Models
By: Zheyue Tan , Mustapha Abdullahi , Tuo Shi and more
Potential Business Impact:
Lets AI learn faster without crashing.
Reinforcement learning (RL) has become a pivotal component of large language model (LLM) post-training, and agentic RL extends this paradigm to operate as agents through multi-turn interaction and tool use. Scaling such systems exposes two practical bottlenecks: (1) context length grows rapidly during training, inflating memory usage and latency, and triggering out-of-memory (OOM) failures; and (2) intermediate tensors accumulate with context length, making cross-device data movement a major system bottleneck. We present EARL, a scalable system for efficient agentic RL. EARL designs a parallelism selector that dynamically adapts model and training parallelism across RL stages based on sequence length and system load, and a data dispatcher that performs layout-aware, decentralized exchange of intermediate data batches. Together, these components increase throughput, reduce long-context failures, and enable stable large-scale training of agentic LLMs without relying on hard limits or penalties of context length.
Similar Papers
Agentic Episodic Control
Artificial Intelligence
AI learns faster by remembering good experiences.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Artificial Intelligence
Lets AI learn to make smart choices.
Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
Distributed, Parallel, and Cluster Computing
Makes AI learn much faster and use computers better.