Bayesian Low-Rank Factorization for Robust Model Adaptation
By: Enes Yavuz Ugan, Ngoc-Quan Pham, Alexander Waibel
Potential Business Impact:
Helps voice AI understand mixed languages better.
Large speech foundation models achieve strong performance across many domains, but they often require adaptation to handle local needs such as code-switching, where speakers mix languages within the same utterance. Direct fine-tuning of these models risks overfitting to the target domain and overwriting the broad capabilities of the base model. To address this challenge, we explore Bayesian factorized adapters for speech foundation models, which place priors near zero to achieve sparser adaptation matrices and thereby retain general performance while adapting to specific domains. We apply our approach to the Whisper model and evaluate on different multilingual code-switching scenarios. Our results show only minimal adaptation loss while significantly reducing catastrophic forgetting of the base model. Compared to LoRA, our method achieves a backward gain of 54% with only a 4% drop on the new domain. These findings highlight the effectiveness of Bayesian adaptation for fine-tuning speech foundation models without sacrificing generalization.
Similar Papers
Efficient Continual Learning in Neural Machine Translation: A Low-Rank Adaptation Approach
Computation and Language
Teaches computers new languages without forgetting old ones.
Bridging the Reality Gap: Efficient Adaptation of ASR systems for Challenging Low-Resource Domains
Computation and Language
Makes doctors' notes understandable by computers.
Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition
Sound
Makes AI understand emotions in voices better.