The Impact of Quantization on Large Reasoning Model Reinforcement Learning
By: Medha Kumar , Zifei Xu , Xin Wang and more
Potential Business Impact:
Makes smart computer brains smaller without losing smarts.
Strong reasoning capabilities can now be achieved by large-scale reinforcement learning (RL) without any supervised fine-tuning. Although post-training quantization (PTQ) and quantization-aware training (QAT) are well studied in the context of fine-tuning, how quantization impacts RL in large reasoning models (LRMs) remains an open question. To answer this question, we conducted systematic experiments and discovered a significant gap in reasoning performance on mathematical benchmarks between post-RL quantized models and their quantization-aware RL optimized counterparts. Our findings suggest that quantization-aware RL training negatively impacted the learning process, whereas PTQ and QLoRA led to greater performance.
Similar Papers
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Computation and Language
Makes smart AI think faster and smaller.
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Artificial Intelligence
Makes big computer brains use less power.
Scaling Laws for Task-Stratified Knowledge in Post-Training Quantized Large Language Models
Computation and Language
Makes big AI models smaller without losing smarts.