Score: 2

Reference-aware SFM layers for intrusive intelligibility prediction

Published: September 21, 2025 | arXiv ID: 2509.17270v1

By: Hanlin Yu , Haoshuai Zhou , Boxuan Cao and more

BigTech Affiliations: Stanford University

Potential Business Impact:

Makes computer speech sound more like real people.

Business Areas:
Semantic Search Internet Services

Intrusive speech-intelligibility predictors that exploit explicit reference signals are now widespread, yet they have not consistently surpassed non-intrusive systems. We argue that a primary cause is the limited exploitation of speech foundation models (SFMs). This work revisits intrusive prediction by combining reference conditioning with multi-layer SFM representations. Our final system achieves RMSE 22.36 on the development set and 24.98 on the evaluation set, ranking 1st on CPC3. These findings provide practical guidance for constructing SFM-based intrusive intelligibility predictors.

Country of Origin
🇺🇸 United States

Page Count
5 pages

Category
Electrical Engineering and Systems Science:
Audio and Speech Processing