Variational Autoencoder for Personalized Pathological Speech Enhancement
By: Mingchi Hou, Ina Kodrasi
Potential Business Impact:
Helps computers understand sick people's voices.
The generalizability of speech enhancement (SE) models across speaker conditions remains largely unexplored, despite its critical importance for broader applicability. This paper investigates the performance of the hybrid variational autoencoder (VAE)-non-negative matrix factorization (NMF) model for SE, focusing primarily on its generalizability to pathological speakers with Parkinson's disease. We show that VAE models trained on large neurotypical datasets perform poorly on pathological speech. While fine-tuning these pre-trained models with pathological speech improves performance, a performance gap remains between neurotypical and pathological speakers. To address this gap, we propose using personalized SE models derived from fine-tuning pre-trained models with only a few seconds of clean data from each speaker. Our results demonstrate that personalized models considerably enhance performance for all speakers, achieving comparable results for both neurotypical and pathological speakers.
Similar Papers
Generating Novel and Realistic Speakers for Voice Conversion
Sound
Creates new voices for talking robots.
Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders
Audio and Speech Processing
Cleans up messy sounds to make voices clear.
Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition
Audio and Speech Processing
Helps computers understand speech with problems.