Score: 0

Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches

Published: June 1, 2025 | arXiv ID: 2506.00853v2

By: Dena Mujtaba, Nihar Mahapatra

Potential Business Impact:

Helps voice assistants understand people who stutter.

Business Areas:
Speech Recognition Data and Analytics, Software

Stuttering -- characterized by involuntary disfluencies such as blocks, prolongations, and repetitions -- is often misinterpreted by automatic speech recognition (ASR) systems, resulting in elevated word error rates and making voice-driven technologies inaccessible to people who stutter. The variability of disfluencies across speakers and contexts further complicates ASR training, compounded by limited annotated stuttered speech data. In this paper, we investigate fine-tuning ASRs for stuttered speech, comparing generalized models (trained across multiple speakers) to personalized models tailored to individual speech characteristics. Using a diverse range of voice-AI scenarios, including virtual assistants and video interviews, we evaluate how personalization affects transcription accuracy. Our findings show that personalized ASRs significantly reduce word error rates, especially in spontaneous speech, highlighting the potential of tailored models for more inclusive voice technologies.

Country of Origin
🇺🇸 United States

Page Count
5 pages

Category
Computer Science:
Sound