Reverse Attention for Lightweight Speech Enhancement on Edge Devices
By: Shuubham Ojha, Felix Gervits, Carol Espy-Wilson
Potential Business Impact:
Cleans up noisy sounds in voices.
This paper introduces a lightweight deep learning model for real-time speech enhancement, designed to operate efficiently on resource-constrained devices. The proposed model leverages a compact architecture that facilitates rapid inference without compromising performance. Key contributions include infusing soft attention-based attention gates in the U-Net architecture which is known to perform well for segmentation tasks and is optimized for GPUs. Experimental evaluations demonstrate that the model achieves competitive speech quality and intelligibility metrics, such as PESQ and Word Error Rates (WER), improving the performance of similarly sized baseline models. We are able to achieve a 6.24% WER improvement and a 0.64 PESQ score improvement over un-enhanced waveforms.
Similar Papers
A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments
Sound
Cleans up noisy voices for clearer talking.
Optimizing Neural Architectures for Hindi Speech Separation and Enhancement in Noisy Environments
Sound
Cleans up noisy Hindi speech for better listening.
Online Audio-Visual Autoregressive Speaker Extraction
Audio and Speech Processing
Helps computers hear one voice in noisy rooms.