SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs
By: Patrik Czakó, Gábor Kertész, Sándor Szénási
Potential Business Impact:
Makes AI models run faster and use less memory.
We present SmoothRot, a novel post-training quantization technique to enhance the efficiency of 4-bit quantization in Large Language Models (LLMs). SmoothRot addresses the critical challenge of massive activation outliers, by integrating channel-wise scaling with Hadamard transformations. Our technique effectively transforms extreme outliers into quantization-friendly activations, significantly improving quantization accuracy. Experiments conducted on popular LLMs (LLaMA2 7B, LLaMA3.1 8B, and Mistral 7B) demonstrate that SmoothRot consistently reduces the performance gap between quantized and FP16 models by approximately 10-30\% across language generation and zero-shot reasoning tasks, without introducing additional inference latency. Code is available at https://github.com/czakop/smoothrot.
Similar Papers
Turning LLM Activations Quantization-Friendly
Machine Learning (CS)
Makes AI smarter and cheaper to run.
ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers
CV and Pattern Recognition
Makes AI image generators faster and smaller.
BASE-Q: Bias and Asymmetric Scaling Enhanced Rotational Quantization for Large Language Models
Machine Learning (CS)
Makes AI models smaller and faster.