Exploring Generative Error Correction for Dysarthric Speech Recognition
By: Moreno La Quatra , Alkis Koudounas , Valerio Mario Salerno and more
Potential Business Impact:
Helps computers understand speech from people with speech problems.
Despite the remarkable progress in end-to-end Automatic Speech Recognition (ASR) engines, accurately transcribing dysarthric speech remains a major challenge. In this work, we proposed a two-stage framework for the Speech Accessibility Project Challenge at INTERSPEECH 2025, which combines cutting-edge speech recognition models with LLM-based generative error correction (GER). We assess different configurations of model scales and training strategies, incorporating specific hypothesis selection to improve transcription accuracy. Experiments on the Speech Accessibility Project dataset demonstrate the strength of our approach on structured and spontaneous speech, while highlighting challenges in single-word recognition. Through comprehensive analysis, we provide insights into the complementary roles of acoustic and linguistic modeling in dysarthric speech recognition
Similar Papers
Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
Audio and Speech Processing
Helps people with speech problems talk to computers.
Towards Temporally Explainable Dysarthric Speech Clarity Assessment
Audio and Speech Processing
Helps people with speech problems practice speaking better.
Robust Cross-Etiology and Speaker-Independent Dysarthric Speech Recognition
Sound
Helps computers understand speech from sick people.