A Study on Zero-Shot Non-Intrusive Speech Intelligibility for Hearing Aids Using Large Language Models
By: Ryandhimas E. Zezario , Dyah A. M. G. Wisnu , Hsin-Min Wang and more
Potential Business Impact:
Helps hearing aids understand speech better.
This work focuses on zero-shot non-intrusive speech assessment for hearing aids (HA) using large language models (LLMs). Specifically, we introduce GPT-Whisper-HA, an extension of GPT-Whisper, a zero-shot non-intrusive speech assessment model based on LLMs. GPT-Whisper-HA is designed for speech assessment for HA, incorporating MSBG hearing loss and NAL-R simulations to process audio input based on each individual's audiogram, two automatic speech recognition (ASR) modules for audio-to-text representation, and GPT-4o to predict two corresponding scores, followed by score averaging for the final estimated score. Experimental results indicate that GPT-Whisper-HA achieves a 2.59% relative root mean square error (RMSE) improvement over GPT-Whisper, confirming the potential of LLMs for zero-shot speech assessment in predicting subjective intelligibility for HA users.
Similar Papers
Probing the Hidden Talent of ASR Foundation Models for L2 English Oral Assessment
Computation and Language
Helps computers judge how well people speak English.
Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM
Audio and Speech Processing
Makes computers understand noisy speech better.
Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM
Audio and Speech Processing
Makes listening to noisy speech easier.