Score: 0

SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache

Published: January 14, 2026 | arXiv ID: 2601.09083v1

By: Chi-Chih Chang , Siqi Zhu , Zhichen Zeng and more

We present Speculative Rollout with Tree-Structured Cache (SRT), a simple, model-free approach to accelerate on-policy reinforcement learning (RL) for language models without sacrificing distributional correctness. SRT exploits the empirical similarity of rollouts for the same prompt across training steps by storing previously generated continuations in a per-prompt tree-structured cache. During generation, the current policy uses this tree as the draft model for performing speculative decoding. To keep the cache fresh and improve draft model quality, SRT updates trees online from ongoing rollouts and proactively performs run-ahead generation during idle GPU bubbles. Integrated into standard RL pipelines (\textit{e.g.}, PPO, GRPO and DAPO) and multi-turn settings, SRT consistently reduces generation and step latency and lowers per-token inference cost, achieving up to 2.08x wall-clock time speedup during rollout.

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training

Machine Learning (CS)

Speeds up AI learning by predicting future words faster.

17 Nov 2025 0

87%

ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems

Machine Learning (CS)

Makes AI learn much faster and better.

30 Oct 2025 0

87%

Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards

Computation and Language

Helps AI learn to think better and faster.

28 Oct 2025 1

View PDF Login to Bookmark

SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache

Technical Abstract

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training

ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems

Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards