Score: 3

BASE-Q: Bias and Asymmetric Scaling Enhanced Rotational Quantization for Large Language Models

Published: May 26, 2025 | arXiv ID: 2506.15689v2

By: Liulu He , Shenli Zheng , Karwei Sun and more

Potential Business Impact:

Makes AI models smaller and faster.

Business Areas:

A/B Testing Data and Analytics

Rotations have become essential to state-of-the-art quantization pipelines for large language models (LLMs) by effectively smoothing outliers in weights and activations. However, further optimizing the rotation parameters offers only limited performance gains and introduces significant training overhead: due to rotation parameter sharing, full-model must be loaded simultaneously to enable backpropagation, resulting in substantial memory consumption and limited practical utility. In this work, we identify two fundamental limitations of current rotational quantization methods: (i) rotation fails to align channel means, resulting in wider quantization bounds and increased rounding errors; and (ii) rotation makes the activation distribution more Gaussian-like, increasing energy loss caused by clipping errors. To address these issues, we introduce \textbf{BASE-Q}, a simple yet powerful approach that combines bias correction and asymmetric scaling to effectively reduce rounding and clipping errors. Furthermore, BASE-Q enables blockwise optimization, eliminating the need for memory-intensive full-model backpropagation. Extensive experiments on various LLMs and benchmarks demonstrate the effectiveness of BASE-Q, narrowing the accuracy gap to full-precision models by 50.5\%, 42.9\%, and 29.2\% compared to QuaRot, SpinQuant, and OSTQuant, respectively. The code will be released soon.

OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization

Machine Learning (CS)

Makes AI models smaller and faster.

30 Dec 2025 0

87%

DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization

Machine Learning (CS)

Makes big computer brains run much faster and smaller.

6 Nov 2025 1

87%

SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs

Computation and Language

Makes AI models run faster and use less memory.

4 Jun 2025 2

View PDF Login to Bookmark

Country of Origin

🇺🇸 🇨🇳 United States, China

Repos / Data Links

github.com

Page Count

19 pages

BASE-Q: Bias and Asymmetric Scaling Enhanced Rotational Quantization for Large Language Models

Makes AI models smaller and faster.

Technical Abstract

OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization

DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization

SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs