Dynamic Quantization Error Propagation in Encoder-Decoder ASR Quantization
By: Xinyu Wang , Yajie Luo , Yihong Wu and more
Potential Business Impact:
Makes voice assistants work better on small devices.
Running Automatic Speech Recognition (ASR) models on memory-constrained edge devices requires efficient compression. While layer-wise post-training quantization is effective, it suffers from error accumulation, especially in encoder-decoder architectures. Existing solutions like Quantization Error Propagation (QEP) are suboptimal for ASR due to the model's heterogeneity, processing acoustic features in the encoder while generating text in the decoder. To address this, we propose Fine-grained Alpha for Dynamic Quantization Error Propagation (FADE), which adaptively controls the trade-off between cross-layer error correction and local quantization. Experiments show that FADE significantly improves stability by reducing performance variance across runs, while simultaneously surpassing baselines in mean WER.
Similar Papers
Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models
Sound
Makes voice assistants work on small devices.
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Machine Learning (CS)
Makes AI models smaller and faster.
SAQ: Stabilizer-Aware Quantum Error Correction Decoder
Quantum Physics
Fixes errors in quantum computers faster and better.