Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation
By: Soo-Whan Chung, Min-Seok Choi
Potential Business Impact:
Fixes noisy speech by understanding the sound around it.
This paper introduces a novel approach to speech restoration by integrating a context-related conditioning strategy. Specifically, we employ the diffusion-based generative restoration model, UNIVERSE++, as a backbone to evaluate the effectiveness of contextual representations. We incorporate acoustic context embeddings extracted from the CLAP model, which capture the environmental attributes of input audio. Additionally, we propose an Acoustic Context (ACX) representation that refines CLAP embeddings to better handle various distortion factors and their intensity in speech signals. Unlike content-based approaches that rely on linguistic and speaker attributes, ACX provides contextual information that enables the restoration model to distinguish and mitigate distortions better. Experimental results indicate that context-aware conditioning improves both restoration performance and its stability across diverse distortion conditions, reducing variability compared to content-based methods.
Similar Papers
A Neural Model for Contextual Biasing Score Learning and Filtering
Audio and Speech Processing
Helps voice assistants understand you better.
Speech-Aware Long Context Pruning and Integration for Contextualized Automatic Speech Recognition
Computation and Language
Listens better to long talks, even with noise.
CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation
Sound
Makes voices sound like anyone, anywhere, naturally.