Score: 0

InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models

Published: May 16, 2025 | arXiv ID: 2505.11574v1

By: Zhen Li , Yupeng Su , Songmiao Wang and more

Potential Business Impact:

Fixes AI math mistakes after shrinking it.

Business Areas:

Quantum Computing Science and Engineering

Large Language Models (LLMs) have demonstrated impressive performance on complex reasoning benchmarks such as GSM8K, MATH, and AIME. However, the substantial computational demands of these tasks pose significant challenges for real-world deployment. Model quantization has emerged as a promising approach to reduce memory footprint and inference latency by representing weights and activations with lower bit-widths. In this work, we conduct a comprehensive study of mainstream quantization methods(e.g., AWQ, GPTQ, SmoothQuant) on the most popular open-sourced models (e.g., Qwen2.5, LLaMA3 series), and reveal that quantization can degrade mathematical reasoning accuracy by up to 69.81%. To better understand this degradation, we develop an automated assignment and judgment pipeline that qualitatively categorizes failures into four error types and quantitatively identifies the most impacted reasoning capabilities. Building on these findings, we employ an automated data-curation pipeline to construct a compact "Silver Bullet" datasets. Training a quantized model on as few as 332 carefully selected examples for just 3-5 minutes on a single GPU is enough to restore its reasoning accuracy to match that of the full-precision baseline.

Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning

Computation and Language

Fixes smart computers that lost math skills.

6 Jan 2025 0

91%

Resource-Efficient Language Models: Quantization for Fast and Accessible Inference

Artificial Intelligence

Makes big computer brains use less power.

13 May 2025 1

91%

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Computation and Language

Makes smart AI think faster and smaller.

7 Apr 2025 3

View PDF Login to Bookmark

Country of Origin

🇭🇰 Hong Kong

Page Count

23 pages

InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models

Fixes AI math mistakes after shrinking it.

Technical Abstract

Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning

Resource-Efficient Language Models: Quantization for Fast and Accessible Inference

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models