An Effective Training Framework for Light-Weight Automatic Speech Recognition Models
By: Abdul Hannan , Alessio Brutti , Shah Nawaz and more
Potential Business Impact:
Makes big voice programs work on small phones.
Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource devices is impractical despite of their favorable performance. Existing approaches (pruning, distillation, layer skip etc.) transform the large models into smaller ones at the cost of significant performance degradation or require prolonged training of smaller models for better performance. To address these issues, we introduce an efficacious two-step representation learning based approach capable of producing several small sized models from a single large model ensuring considerably better performance in limited number of epochs. Comprehensive experimentation on ASR benchmarks reveals the efficacy of our approach, achieving three-fold training speed-up and up to 12.54% word error rate improvement.
Similar Papers
AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition
Audio and Speech Processing
Helps people with speech problems talk to computers.
Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning
Artificial Intelligence
Lets computers understand Arabic speech without human help.
Customizing Speech Recognition Model with Large Language Model Feedback
Computation and Language
Helps computers understand rare words in speech.