Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition
By: Niclas Pokel , Pehuén Moure , Roman Boehringer and more
Potential Business Impact:
Helps computers understand speech with problems.
Speech impairments resulting from congenital disorders, such as cerebral palsy, down syndrome, or apert syndrome, as well as acquired brain injuries due to stroke, traumatic accidents, or tumors, present major challenges to automatic speech recognition (ASR) systems. Despite recent advancements, state-of-the-art ASR models like Whisper still struggle with non-normative speech due to limited training data availability and high acoustic variability. Moreover, collecting and annotating non-normative speech is burdensome: speaking is effortful for many affected individuals, while laborious annotation often requires caregivers familiar with the speaker. This work introduces a novel ASR personalization method based on Bayesian Low-rank Adaptation for data-efficient fine-tuning. We validate our method on the English UA-Speech dataset and a newly collected German speech dataset, BF-Sprache, from a child with structural speech impairment. The dataset and approach are designed to reflect the challenges of low-resource settings that include individuals with speech impairments. Our method significantly improves ASR accuracy for impaired speech while maintaining data and annotation efficiency, offering a practical path toward inclusive ASR.
Similar Papers
Data-Efficient ASR Personalization for Non-Normative Speech Using an Uncertainty-Based Phoneme Difficulty Score for Guided Sampling
Audio and Speech Processing
Helps computers understand speech from people with disabilities.
AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition
Audio and Speech Processing
Helps people with speech problems talk to computers.
Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
Sound
Helps computers understand non-native English speakers better.