Score: 1

GranQ: Granular Zero-Shot Quantization with Channel-Wise Activation Scaling in QAT

Published: March 24, 2025 | arXiv ID: 2503.18339v5

By: Inpyo Hong , Youngwan Jo , Hyojeong Lee and more

Potential Business Impact:

Makes computer brains smaller, faster, and smarter.

Business Areas:
Quantum Computing Science and Engineering

Zero-shot quantization (ZSQ) enables neural network compression without original training data, making it a promising solution for restricted data access scenarios. To compensate for the lack of data, recent ZSQ methods typically rely on synthetic inputs generated from the full-precision model. However, these synthetic inputs often lead to activation distortion, especially under low-bit settings. To mitigate this, existing methods typically employ per-channel scaling, but they still struggle due to the severe computational overhead during the accumulation process. To overcome this critical bottleneck, we propose GranQ, a novel activation quantization framework that introduces an efficient pre-scaling strategy. Unlike conventional channel-wise methods that repeatedly perform scaling operations during accumulation, GranQ applies scaling factors in a pre-scaling step through fully vectorized computation, eliminating runtime scaling overhead. This design enables GranQ to maintain fine-grained quantization accuracy while significantly reducing computational burden, particularly in low-bit quantization settings. Extensive experiments under quantization-aware training (QAT) settings demonstrate that GranQ consistently outperforms state-of-the-art ZSQ methods across CIFAR and ImageNet. In particular, our method achieves up to 5.45% higher accuracy in the 3-bit setting on CIFAR-100 and even surpasses the full-precision baseline on CIFAR-10. Furthermore, GranQ achieves significant speedup in quantization latency over conventional per-channel methods, demonstrating improved efficiency. With these findings, we anticipate that GranQ will inspire future research beyond conventional ZSQ approaches centered on data generation and model fine-tuning. The official code is available at https://github.com/anonymus-orange/GranQ.

Country of Origin
🇰🇷 Korea, Republic of

Repos / Data Links

Page Count
13 pages

Category
Computer Science:
CV and Pattern Recognition