Score: 0

Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement

Published: June 2, 2025 | arXiv ID: 2506.01460v1

By: Seungu Han , Sungho Lee , Juheon Lee and more

Potential Business Impact:

Cleans up noisy sounds with fewer steps.

Business Areas:

A/B Testing Data and Analytics

Deep generative models have recently been employed for speech enhancement to generate perceptually valid clean speech on large-scale datasets. Several diffusion models have been proposed, and more recently, a tractable Schr\"odinger Bridge has been introduced to transport between the clean and noisy speech distributions. However, these models often suffer from an iterative reverse process and require a large number of sampling steps -- more than 50. Our investigation reveals that the performance of baseline models significantly degrades when the number of sampling steps is reduced, particularly under low-SNR conditions. We propose integrating Schr\"odinger Bridge with GANs to effectively mitigate this issue, achieving high-quality outputs on full-band datasets while substantially reducing the required sampling steps. Experimental results demonstrate that our proposed model outperforms existing baselines, even with a single inference step, in both denoising and dereverberation tasks.