Reference-aware SFM layers for intrusive intelligibility prediction
By: Hanlin Yu , Haoshuai Zhou , Boxuan Cao and more
Potential Business Impact:
Makes computer speech sound more like real people.
Intrusive speech-intelligibility predictors that exploit explicit reference signals are now widespread, yet they have not consistently surpassed non-intrusive systems. We argue that a primary cause is the limited exploitation of speech foundation models (SFMs). This work revisits intrusive prediction by combining reference conditioning with multi-layer SFM representations. Our final system achieves RMSE 22.36 on the development set and 24.98 on the evaluation set, ranking 1st on CPC3. These findings provide practical guidance for constructing SFM-based intrusive intelligibility predictors.
Similar Papers
Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People
Artificial Intelligence
Improves hearing aids by predicting speech clarity.
What do Speech Foundation Models Learn? Analysis and Applications
Computation and Language
Helps computers understand spoken words better.
Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners
Sound
Helps hearing aids understand speech better anywhere.