Score: 1

Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement

Published: May 7, 2025 | arXiv ID: 2505.04237v1

By: Rauf Nasretdinov, Roman Korostik, Ante Jukić

BigTech Affiliations: NVIDIA

Potential Business Impact:

Cleans up noisy speech for better computer understanding.

Business Areas:
Speech Recognition Data and Analytics, Software

In this work, we investigate application of generative speech enhancement to improve the robustness of ASR models in noisy and reverberant conditions. We employ a recently-proposed speech enhancement model based on Schr\"odinger bridge, which has been shown to perform well compared to diffusion-based approaches. We analyze the impact of model scaling and different sampling methods on the ASR performance. Furthermore, we compare the considered model with predictive and diffusion-based baselines and analyze the speech recognition performance when using different pre-trained ASR models. The proposed approach significantly reduces the word error rate, reducing it by approximately 40% relative to the unprocessed speech signals and by approximately 8% relative to a similarly sized predictive approach.

Country of Origin
🇺🇸 United States

Page Count
5 pages

Category
Electrical Engineering and Systems Science:
Audio and Speech Processing