Score: 1

Defeating the Training-Inference Mismatch via FP16

Published: October 30, 2025 | arXiv ID: 2510.26788v1

By: Penghui Qi , Zichen Liu , Xiangxin Zhou and more

Potential Business Impact:

Makes AI learn better by using simpler math.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Reinforcement learning (RL) fine-tuning of large language models (LLMs) often suffers from instability due to the numerical mismatch between the training and inference policies. While prior work has attempted to mitigate this issue through algorithmic corrections or engineering alignments, we show that its root cause lies in the floating point precision itself. The widely adopted BF16, despite its large dynamic range, introduces large rounding errors that breaks the consistency between training and inference. In this work, we demonstrate that simply reverting to \textbf{FP16} effectively eliminates this mismatch. The change is simple, fully supported by modern frameworks with only a few lines of code change, and requires no modification to the model architecture or learning algorithm. Our results suggest that using FP16 uniformly yields more stable optimization, faster convergence, and stronger performance across diverse tasks, algorithms and frameworks. We hope these findings motivate a broader reconsideration of precision trade-offs in RL fine-tuning.

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

Machine Learning (CS)

Makes AI learn faster and more efficiently.

20 Jan 2026 1

87%

NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs

Distributed, Parallel, and Cluster Computing

Makes AI answer questions much faster.

29 May 2025 0

87%

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

Machine Learning (CS)

Makes AI faster and use less memory.

26 Jan 2026 2

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

16 pages

Defeating the Training-Inference Mismatch via FP16

Makes AI learn better by using simpler math.

Technical Abstract

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning