Stabilizing Quantization-Aware Training by Implicit-Regularization on Hessian Matrix
By: Junbiao Pang, Tianyang Cai
Potential Business Impact:
Makes AI models smaller and faster.
Quantization-Aware Training (QAT) is one of the prevailing neural network compression solutions. However, its stability has been challenged for yielding deteriorating performances as the quantization error is inevitable. We find that the sharp landscape of loss, which leads to a dramatic performance drop, is an essential factor that causes instability. Theoretically, we have discovered that the perturbations in the feature would bring a flat local minima. However, simply adding perturbations into either weight or feature empirically deteriorates the performance of the Full Precision (FP) model. In this paper, we propose Feature-Perturbed Quantization (FPQ) to stochastically perturb the feature and employ the feature distillation method to the quantized model. Our method generalizes well to different network architectures and various QAT methods. Furthermore, we mathematically show that FPQ implicitly regularizes the Hessian norm, which calibrates the smoothness of a loss landscape. Extensive experiments demonstrate that our approach significantly outperforms the current State-Of-The-Art (SOTA) QAT methods and even the FP counterparts.
Similar Papers
Quantization Meets OOD: Generalizable Quantization-aware Training from a Flatness Perspective
CV and Pattern Recognition
Makes AI better at understanding new, unseen things.
Oscillations Make Neural Networks Robust to Quantization
Machine Learning (CS)
Makes AI models work better with less data.
ZeroQAT: Your Quantization-aware Training but Efficient
Machine Learning (CS)
Makes smart computer programs run faster and smaller.