No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction
By: Haoshuai Zhou , Changgeng Mo , Boxuan Cao and more
Potential Business Impact:
Helps computers guess how well you hear speech.
Personalized speech intelligibility prediction is challenging. Previous approaches have mainly relied on audiograms, which are inherently limited in accuracy as they only capture a listener's hearing threshold for pure tones. Rather than incorporating additional listener features, we propose a novel approach that leverages an individual's existing intelligibility data to predict their performance on new audio. We introduce the Support Sample-Based Intelligibility Prediction Network (SSIPNet), a deep learning model that leverages speech foundation models to build a high-dimensional representation of a listener's speech recognition ability from multiple support (audio, score) pairs, enabling accurate predictions for unseen audio. Results on the Clarity Prediction Challenge dataset show that, even with a small number of support (audio, score) pairs, our method outperforms audiogram-based predictions. Our work presents a new paradigm for personalized speech intelligibility prediction.
Similar Papers
Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners
Sound
Helps hearing aids understand speech better anywhere.
Non-Intrusive Intelligibility Prediction for Hearing Aids: Recent Advances, Trends, and Challenges
Audio and Speech Processing
Helps hearing aids understand speech better.
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Sound
Makes computer voices speak clearly, even tricky words.