Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement
By: Seungu Han , Sungho Lee , Juheon Lee and more
Potential Business Impact:
Cleans up noisy sounds with fewer steps.
Deep generative models have recently been employed for speech enhancement to generate perceptually valid clean speech on large-scale datasets. Several diffusion models have been proposed, and more recently, a tractable Schr\"odinger Bridge has been introduced to transport between the clean and noisy speech distributions. However, these models often suffer from an iterative reverse process and require a large number of sampling steps -- more than 50. Our investigation reveals that the performance of baseline models significantly degrades when the number of sampling steps is reduced, particularly under low-SNR conditions. We propose integrating Schr\"odinger Bridge with GANs to effectively mitigate this issue, achieving high-quality outputs on full-band datasets while substantially reducing the required sampling steps. Experimental results demonstrate that our proposed model outperforms existing baselines, even with a single inference step, in both denoising and dereverberation tasks.
Similar Papers
Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement
Audio and Speech Processing
Cleans up noisy speech for better computer understanding.
Robust time series generation via Schrödinger Bridge: a comprehensive evaluation
Machine Learning (CS)
Creates realistic future data from past patterns.
Regularized Schrödinger Bridge: Alleviating Distortion and Exposure Bias in Solving Inverse Problems
Machine Learning (CS)
Fixes blurry sounds and makes them clear.