Test-Time Training for Speech Enhancement
By: Avishkar Behera , Riya Ann Easow , Venkatesh Parvathala and more
Potential Business Impact:
Cleans up noisy speech on the fly.
This paper introduces a novel application of Test-Time Training (TTT) for Speech Enhancement, addressing the challenges posed by unpredictable noise conditions and domain shifts. This method combines a main speech enhancement task with a self-supervised auxiliary task in a Y-shaped architecture. The model dynamically adapts to new domains during inference time by optimizing the proposed self-supervised tasks like noise-augmented signal reconstruction or masked spectrogram prediction, bypassing the need for labeled data. We further introduce various TTT strategies offering a trade-off between adaptation and efficiency. Evaluations across synthetic and real-world datasets show consistent improvements across speech quality metrics, outperforming the baseline model. This work highlights the effectiveness of TTT in speech enhancement, providing insights for future research in adaptive and robust speech processing.
Similar Papers
Instance-Specific Test-Time Training for Speech Editing in the Wild
Audio and Speech Processing
Makes voice editing sound natural in any place.
Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation
CV and Pattern Recognition
Makes videos understandable using sound.
Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts
Machine Learning (CS)
Helps doctors know who needs breathing machines sooner.