Score: 3

Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER

Published: January 29, 2026 | arXiv ID: 2601.21347v1

By: Xiuwen Zheng , Sixun Dong , Bornali Phukon and more

Potential Business Impact:

Fixes computer speech errors for clearer understanding.

Business Areas:

Speech Recognition Data and Analytics, Software

While Automatic Speech Recognition (ASR) is typically benchmarked by word error rate (WER), real-world applications ultimately hinge on semantic fidelity. This mismatch is particularly problematic for dysarthric speech, where articulatory imprecision and disfluencies can cause severe semantic distortions. To bridge this gap, we introduce a Large Language Model (LLM)-based agent for post-ASR correction: a Judge-Editor over the top-k ASR hypotheses that keeps high-confidence spans, rewrites uncertain segments, and operates in both zero-shot and fine-tuned modes. In parallel, we release SAP-Hypo5, the largest benchmark for dysarthric speech correction, to enable reproducibility and future exploration. Under multi-perspective evaluation, our agent achieves a 14.51% WER reduction alongside substantial semantic gains, including a +7.59 pp improvement in MENLI and +7.66 pp in Slot Micro F1 on challenging samples. Our analysis further reveals that WER is highly sensitive to domain shift, whereas semantic metrics correlate more closely with downstream task performance.

Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models

Audio and Speech Processing

Helps people with speech problems talk to computers.

19 Dec 2025 0

92%

WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue

Computation and Language

Makes doctor talk machines safer for patients.

20 Nov 2025 1

91%

WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue

Computation and Language

Makes doctor talk computers safer for patients.

20 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇰🇷 🇺🇸 Korea, Republic of, United States

Repos / Data Links

github.com github.com github.com huggingface.co

Page Count

5 pages

Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER

Fixes computer speech errors for clearer understanding.

Technical Abstract

Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models

WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue

WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue