Rescaling-Aware Training for Efficient Deployment of Deep Learning Models on Full-Integer Hardware
By: Lion Mueller , Alberto Garcia-Ortiz , Ardalan Najafi and more
Potential Business Impact:
Makes AI on small devices run faster, cheaper.
Integer AI inference significantly reduces computational complexity in embedded systems. Quantization-aware training (QAT) helps mitigate accuracy degradation associated with post-training quantization but still overlooks the impact of integer rescaling during inference, which is a hardware costly operation in integer-only AI inference. This work shows that rescaling cost can be dramatically reduced post-training, by applying a stronger quantization to the rescale multiplicands at no model-quality loss. Furthermore, we introduce Rescale-Aware Training, a fine tuning method for ultra-low bit-width rescaling multiplicands. Experiments show that even with 8x reduced rescaler widths, the full accuracy is preserved through minimal incremental retraining. This enables more energy-efficient and cost-efficient AI inference for resource-constrained embedded systems.
Similar Papers
Learning Quantized Continuous Controllers for Integer Hardware
Machine Learning (CS)
Makes robots move faster using less power.
DQT: Dynamic Quantization Training via Dequantization-Free Nested Integer Arithmetic
Machine Learning (CS)
Makes AI smarter using less computer power.
ZeroQAT: Your Quantization-aware Training but Efficient
Machine Learning (CS)
Makes smart computer programs run faster and smaller.