State-of-the-Art Dysarthric Speech Recognition with MetaICL for on-the-fly Personalization
By: Dhruuv Agarwal , Harry Zhang , Yang Yu and more
Potential Business Impact:
Helps computers understand speech with a lisp.
Personalizing Automatic Speech Recognition (ASR) for dysarthric speech is crucial but challenging due to training and storing of individual user adapters. We propose a hybrid meta-training method for a single model, excelling in zero-shot and few-shot on-the-fly personalization via in-context learning (ICL). Measuring Word Error Rate (WER) on state-of-the-art subsets, the model achieves 13.9% WER on Euphonia which surpasses speaker-independent baselines (17.5% WER) and rivals user-specific personalized models. On SAP Test 1, its 5.3% WER significantly bests the 8% from even personalized adapters. We also demonstrate the importance of example curation, where an oracle text-similarity method shows 5 curated examples can achieve performance similar to 19 randomly selected ones, highlighting a key area for future efficiency gains. Finally, we conduct data ablations to measure the data efficiency of this approach. This work presents a practical, scalable, and personalized solution.
Similar Papers
Improved Dysarthric Speech to Text Conversion via TTS Personalization
Sound
Helps people with speech problems talk to computers.
Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies
Sound
Helps computers understand speech from people with speech problems.
WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue
Computation and Language
Makes doctor talk machines safer for patients.