Cross-Learning Fine-Tuning Strategy for Dysarthric Speech Recognition Via CDSD database
By: Qing Xiao, Yingshan Peng, PeiPei Zhang
Potential Business Impact:
Helps computers understand speech from people with problems.
Dysarthric speech recognition faces challenges from severity variations and disparities relative to normal speech. Conventional approaches individually fine-tune ASR models pre-trained on normal speech per patient to prevent feature conflicts. Counter-intuitively, experiments reveal that multi-speaker fine-tuning (simultaneously on multiple dysarthric speakers) improves recognition of individual speech patterns. This strategy enhances generalization via broader pathological feature learning, mitigates speaker-specific overfitting, reduces per-patient data dependence, and improves target-speaker accuracy - achieving up to 13.15% lower WER versus single-speaker fine-tuning.
Similar Papers
Improved Dysarthric Speech to Text Conversion via TTS Personalization
Sound
Helps people with speech problems talk to computers.
Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition
Sound
Helps computers understand people with speech problems.
Robust Cross-Etiology and Speaker-Independent Dysarthric Speech Recognition
Sound
Helps computers understand speech from sick people.