Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
By: Ling Sun, Charlotte Zhu, Shuju Shi
Potential Business Impact:
Helps computers understand non-native English speakers better.
General-purpose ASR underperforms for atypical speakers, such as L2 learners, reinforcing bias and limiting use in education and accessibility. Using the CEFR-graded Speak and Improve corpus, we show that naive fine-tuning of Whisper reduces average WER but simultaneously widens disparities and disproportionately harms lower-level learners. To address this, we propose two strategies: (i) proficiency-aware multitask learning, jointly optimizing ASR with proficiency classification, and (ii) targeted augmentation, applying spectrogram masking to low-proficiency speech to counter imbalance. These approaches reduce WER by up to 29.4 percent (relative) and insertion/deletion errors by as much as 58.6 percent (relative). Crucially, despite the severe imbalance of the dataset reflecting real-world distributions, both strategies consistently narrow proficiency gaps, advancing equitable ASR for L2 learners.
Similar Papers
Towards Fair ASR For Second Language Speakers Using Fairness Prompted Finetuning
Computation and Language
Helps voice assistants understand all accents better.
Probing the Hidden Talent of ASR Foundation Models for L2 English Oral Assessment
Computation and Language
Helps computers judge how well people speak English.
Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering
Computation and Language
Makes voice assistants understand tricky words better.