Score: 0

Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection

Published: May 28, 2025 | arXiv ID: 2505.22005v1

By: Shangkun Huang , Jing Deng , Jintao Kang and more

Potential Business Impact:

Helps computers understand people who stutter better.

Business Areas:

Speech Recognition Data and Analytics, Software

The performance bottleneck of Automatic Speech Recognition (ASR) in stuttering speech scenarios has limited its applicability in domains such as speech rehabilitation. This paper proposed an LLM-driven ASR-SED multi-task learning framework that jointly optimized the ASR and Stuttering Event Detection (SED) tasks. We proposed a dynamic interaction mechanism where the ASR branch leveraged CTC-generated soft prompts to assist LLM context modeling, while the SED branch output stutter embeddings to enhance LLM comprehension of stuttered speech. We incorporated contrastive learning to strengthen the discriminative power of stuttering acoustic features and applied Focal Loss to mitigate the long-tailed distribution in stuttering event categories. Evaluations on the AS-70 Mandarin stuttering dataset demonstrated that our framework reduced the ASR character error rate (CER) to 5.45% (-37.71% relative reduction) and achieved an average SED F1-score of 73.63% (+46.58% relative improvement).

A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations

Computation and Language

Helps computers understand many languages spoken together.

26 Jun 2025 0

90%

SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings

Computation and Language

Lets computers understand spoken words for tasks.

29 Aug 2025 0

89%

Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches

Sound

Helps computers understand speech with unclear pronunciation.

11 Aug 2025 0

View PDF Login to Bookmark

Page Count

5 pages

Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection

Helps computers understand people who stutter better.

Technical Abstract

A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations

SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings

Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches