Score: 2

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

Published: January 26, 2026 | arXiv ID: 2601.18150v1

By: Zhaopeng Qiu , Shuang Yu , Jingqi Zhang and more

BigTech Affiliations: NVIDIA

Potential Business Impact:

Makes AI faster and use less memory.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Reinforcement learning (RL) for large language models (LLMs) is increasingly bottlenecked by rollout (generation), where long output sequence lengths make attention and KV-cache memory dominate end-to-end step time. FP8 offers an attractive lever for accelerating RL by reducing compute cost and memory traffic during rollout, but applying FP8 in RL introduces unique engineering and algorithmic challenges: policy weights change every step (requiring repeated quantization and weight synchronization into the inference engine) and low-precision rollouts can deviate from the higher-precision policy assumed by the trainer, causing train-inference mismatch and potential instability. This report presents a practical FP8 rollout stack for LLM RL, implemented in the veRL ecosystem with support for common training backends (e.g., FSDP/Megatron-LM) and inference engines (e.g., vLLM/SGLang). We (i) enable FP8 W8A8 linear-layer rollout using blockwise FP8 quantization, (ii) extend FP8 to KV-cache to remove long-context memory bottlenecks via per-step QKV scale recalibration, and (iii) mitigate mismatch using importance-sampling-based rollout correction (token-level TIS/MIS variants). Across dense and MoE models, these techniques deliver up to 44% rollout throughput gains while preserving learning behavior comparable to BF16 baselines.

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Machine Learning (CS)

Makes AI learn faster and more efficiently.

20 Jan 2026 1

88%

Towards Fully FP8 GEMM LLM Training at Scale

Machine Learning (CS)

Trains big computer brains faster and better.

26 May 2025 1

88%

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Machine Learning (CS)

Makes smart computer programs learn faster, cheaper.

13 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

16 pages

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

Makes AI faster and use less memory.

Technical Abstract

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Towards Fully FP8 GEMM LLM Training at Scale

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs