Score: 1

Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR

Published: October 12, 2025 | arXiv ID: 2510.10738v1

By: Ling Sun, Charlotte Zhu, Shuju Shi

Potential Business Impact:

Helps computers understand non-native English speakers better.

Business Areas:
Speech Recognition Data and Analytics, Software

General-purpose ASR underperforms for atypical speakers, such as L2 learners, reinforcing bias and limiting use in education and accessibility. Using the CEFR-graded Speak and Improve corpus, we show that naive fine-tuning of Whisper reduces average WER but simultaneously widens disparities and disproportionately harms lower-level learners. To address this, we propose two strategies: (i) proficiency-aware multitask learning, jointly optimizing ASR with proficiency classification, and (ii) targeted augmentation, applying spectrogram masking to low-proficiency speech to counter imbalance. These approaches reduce WER by up to 29.4 percent (relative) and insertion/deletion errors by as much as 58.6 percent (relative). Crucially, despite the severe imbalance of the dataset reflecting real-world distributions, both strategies consistently narrow proficiency gaps, advancing equitable ASR for L2 learners.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
5 pages

Category
Computer Science:
Sound