SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR
By: Wei-Ping Huang, Guan-Ting Lin, Hung-yi Lee
Potential Business Impact:
Makes voice assistants understand messy speech better.
Despite progress in end-to-end ASR, real-world domain mismatches still cause performance drops, which Test-Time Adaptation (TTA) aims to mitigate by adjusting models during inference. Recent work explores combining TTA with external language models, using techniques like beam search rescoring or generative error correction. In this work, we identify a previously overlooked challenge: TTA can interfere with language model rescoring, revealing the nontrivial nature of effectively combining the two methods. Based on this insight, we propose SUTA-LM, a simple yet effective extension of SUTA, an entropy-minimization-based TTA approach, with language model rescoring. SUTA-LM first applies a controlled adaptation process guided by an auto-step selection mechanism leveraging both acoustic and linguistic information, followed by language model rescoring to refine the outputs. Experiments on 18 diverse ASR datasets show that SUTA-LM achieves robust results across a wide range of domains.
Similar Papers
You only need 4 extra tokens: Synergistic Test-time Adaptation for LLMs
Computation and Language
Helps AI learn new jobs without new lessons.
Ultra-Light Test-Time Adaptation for Vision--Language Models
CV and Pattern Recognition
Makes AI better at seeing new things.
Realistic Test-Time Adaptation of Vision-Language Models
CV and Pattern Recognition
Helps AI understand new things without extra training.