Zero- and One-Shot Data Augmentation for Sentence-Level Dysarthric Speech Recognition in Constrained Scenarios
By: Shiyao Wang , Shiwan Zhao , Jiaming Zhou and more
Potential Business Impact:
Helps computers understand speech with unclear pronunciation.
Dysarthric speech recognition (DSR) research has witnessed remarkable progress in recent years, evolving from the basic understanding of individual words to the intricate comprehension of sentence-level expressions, all driven by the pressing communication needs of individuals with dysarthria. Nevertheless, the scarcity of available data remains a substantial hurdle, posing a significant challenge to the development of effective sentence-level DSR systems. In response to this issue, dysarthric data augmentation (DDA) has emerged as a highly promising approach. Generative models are frequently employed to generate training data for automatic speech recognition tasks. However, their effectiveness hinges on the ability of the synthesized data to accurately represent the target domain. The wide-ranging variability in pronunciation among dysarthric speakers makes it extremely difficult for models trained on data from existing speakers to produce useful augmented data, especially in zero-shot or one-shot learning settings. To address this limitation, we put forward a novel text-coverage strategy specifically designed for text-matching data synthesis. This innovative strategy allows for efficient zero/one-shot DDA, leading to substantial enhancements in the performance of DSR when dealing with unseen dysarthric speakers. Such improvements are of great significance in practical applications, including dysarthria rehabilitation programs and day-to-day common-sentence communication scenarios.
Similar Papers
Improved Dysarthric Speech to Text Conversion via TTS Personalization
Sound
Helps people with speech problems talk to computers.
DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model
Sound
Makes speech understandable while keeping the voice.
Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning
Sound
Makes computers speak like people with speech problems.