GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
By: Guang Liang, Xinyao Liu, Jianxin Wu
Potential Business Impact:
Makes computer vision faster and smaller.
Vision Transformers (ViTs) are essential in computer vision but are computationally intensive, too. Model quantization, particularly to low bit-widths like 4-bit, aims to alleviate this difficulty, yet existing Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) methods exhibit significant limitations. PTQ often incurs substantial accuracy drop, while QAT achieves high accuracy but suffers from prohibitive computational costs, limited generalization to downstream tasks, training instability, and lacking of open-source codebase. To address these challenges, this paper introduces General, Practical, and Lightning Quantization (GPLQ), a novel framework designed for efficient and effective ViT quantization. GPLQ is founded on two key empirical insights: the paramount importance of activation quantization and the necessity of preserving the model's original optimization ``basin'' to maintain generalization. Consequently, GPLQ employs a sequential ``activation-first, weights-later'' strategy. Stage 1 keeps weights in FP32 while quantizing activations with a feature mimicking loss in only 1 epoch to keep it stay in the same ``basin'', thereby preserving generalization. Stage 2 quantizes weights using a PTQ method. As a result, GPLQ is 100x faster than existing QAT methods, lowers memory footprint to levels even below FP32 training, and achieves 4-bit model performance that is highly competitive with FP32 models in terms of both accuracy on ImageNet and generalization to diverse downstream tasks, including fine-grained visual classification and object detection. We will release an easy-to-use open-source toolkit supporting multiple vision tasks.
Similar Papers
IPTQ-ViT: Post-Training Quantization of Non-linear Functions for Integer-only Vision Transformers
CV and Pattern Recognition
Makes computer vision faster without losing quality.
LampQ: Towards Accurate Layer-wise Mixed Precision Quantization for Vision Transformers
CV and Pattern Recognition
Makes AI image tools smaller and faster.
ZeroQAT: Your Quantization-aware Training but Efficient
Machine Learning (CS)
Makes smart computer programs run faster and smaller.