Pruning as Regularization: Sensitivity-Aware One-Shot Pruning in ASR
By: Julian Irigoyen, Arthur Söhler, Andreas Søeborg Kirkedal
Potential Business Impact:
Makes voice assistants understand better and smaller.
We challenge the conventional view of neural network pruning as solely a compression technique, demonstrating that one-shot magnitude pruning serves as a powerful implicit regularizer for ASR. Using Whisper-small, we combine gradient- and Fisher-based sensitivity diagnostics with targeted, component-wise pruning. This reveals architectural asymmetries: decoder FFNs are pruning-fragile, whereas decoder self-attention and the last encoder layers contain redundancy that, when removed, improves generalization. Without fine-tuning, pruning 50% of decoder self-attention reduces WER by 2.38% absolute (20.44% relative) on LibriSpeech test-other; pruning the last four encoder layers at 50% instead yields a 1.72% absolute (14.8% relative) improvement. Gains persisted on Common Voice and TED-LIUM datasets. Beyond regularization benefits, our sensitivity-aware approach enables more aggressive one-shot compression. At 40% sparsity, where established global pruning approaches catastrophically fail, our method preserves near-baseline accuracy. This positions pruning as a first-class architectural design tool: knowing where to prune is as important as how much to prune.
Similar Papers
Structured Sparsity and Weight-adaptive Pruning for Memory and Compute efficient Whisper models
Machine Learning (CS)
Makes speech recognition work on small devices.
Teacher-Guided One-Shot Pruning via Context-Aware Knowledge Distillation
CV and Pattern Recognition
Makes computer programs smaller without losing quality.
Pruning-aware Loss Functions for STOI-Optimized Pruned Recurrent Autoencoders for the Compression of the Stimulation Patterns of Cochlear Implants at Zero Delay
Sound
Makes hearing aids use less power, work better.