Dereverberation Using Binary Residual Masking with Time-Domain Consistency
By: Daniel G. Williams
Potential Business Impact:
Cleans up echo in voices for clearer sound.
Vocal dereverberation remains a challenging task in audio processing, particularly for real-time applications where both accuracy and efficiency are crucial. Traditional deep learning approaches often struggle to suppress reverberation without degrading vocal clarity, while recent methods that jointly predict magnitude and phase have significant computational cost. We propose a real-time dereverberation framework based on residual mask prediction in the short-time Fourier transform (STFT) domain. A U-Net architecture is trained to estimate a residual reverberation mask that suppresses late reflections while preserving direct speech components. A hybrid objective combining binary cross-entropy, residual magnitude reconstruction, and time-domain consistency further encourages both accurate suppression and perceptual quality. Together, these components enable low-latency dereverberation suitable for real-world speech and singing applications.
Similar Papers
U-DREAM: Unsupervised Dereverberation guided by a Reverberation Model
Sound
Cleans up echoey sounds without needing perfect recordings.
Real-Time Speech Enhancement via a Hybrid ViT: A Dual-Input Acoustic-Image Feature Fusion
Sound
Cleans up noisy sounds so you can hear speech better.
SONAR: Spectral-Contrastive Audio Residuals for Generalizable Deepfake Detection
Sound
Finds fake voices by listening to tiny sound details.