Bridging the Language Gap: Synthetic Voice Diversity via Latent Mixup for Equitable Speech Recognition
By: Wesley Bian, Xiaofeng Lin, Guang Cheng
Potential Business Impact:
Helps computers understand less common languages better.
Modern machine learning models for audio tasks often exhibit superior performance on English and other well-resourced languages, primarily due to the abundance of available training data. This disparity leads to an unfair performance gap for low-resource languages, where data collection is both challenging and costly. In this work, we introduce a novel data augmentation technique for speech corpora designed to mitigate this gap. Through comprehensive experiments, we demonstrate that our method significantly improves the performance of automatic speech recognition systems on low-resource languages. Furthermore, we show that our approach outperforms existing augmentation strategies, offering a practical solution for enhancing speech technology in underrepresented linguistic communities.
Similar Papers
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
Computation and Language
Creates many different voices for computers to speak.
Bridging Language Gaps: Enhancing Few-Shot Language Adaptation
Computation and Language
Helps computers learn many languages with less data.
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
Computation and Language
Makes computers speak many languages clearly and naturally.