Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement
By: Rauf Nasretdinov, Roman Korostik, Ante Jukić
Potential Business Impact:
Cleans up noisy speech for better computer understanding.
In this work, we investigate application of generative speech enhancement to improve the robustness of ASR models in noisy and reverberant conditions. We employ a recently-proposed speech enhancement model based on Schr\"odinger bridge, which has been shown to perform well compared to diffusion-based approaches. We analyze the impact of model scaling and different sampling methods on the ASR performance. Furthermore, we compare the considered model with predictive and diffusion-based baselines and analyze the speech recognition performance when using different pre-trained ASR models. The proposed approach significantly reduces the word error rate, reducing it by approximately 40% relative to the unprocessed speech signals and by approximately 8% relative to a similarly sized predictive approach.
Similar Papers
Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement
Sound
Cleans up noisy sounds with fewer steps.
Robust time series generation via Schrödinger Bridge: a comprehensive evaluation
Machine Learning (CS)
Creates realistic future data from past patterns.
Regularized Schrödinger Bridge: Alleviating Distortion and Exposure Bias in Solving Inverse Problems
Machine Learning (CS)
Fixes blurry sounds and makes them clear.