SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models
By: Yuan-Kuei Wu , Yang Liu , Yiteng Huang and more
Spoken Language Models (SLMs) are increasingly central to modern speech-driven applications, but performance degrades under acoustic shift - real-world noise, reverberation, and microphone variation. Prior solutions rely on offline domain adaptation, which is post-hoc, data-intensive, and slow. We introduce the first test-time adaptation (TTA) framework for generative SLMs that process interleaved audio-text prompts. Our method updates a small, targeted subset of parameters during inference using only the incoming utterance, requiring no source data or labels. This stabilizes token distributions and improves robustness to acoustic variability without degrading core task accuracy. Evaluated on automatic speech recognition, speech translation, and 19 audio understanding tasks from AIR-Bench, our approach yields consistent gains under diverse corruptions. Because adaptation touches only a small fraction of weights, it is both compute- and memory-efficient, supporting deployment on resource-constrained platforms. This work enhances the robustness and adaptability of generative SLMs for real-world speech-driven applications.
Similar Papers
You only need 4 extra tokens: Synergistic Test-time Adaptation for LLMs
Computation and Language
Helps AI learn new jobs without new lessons.
EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition
Sound
Helps computers understand emotions in voices better.
Ultra-Light Test-Time Adaptation for Vision--Language Models
CV and Pattern Recognition
Makes AI better at seeing new things.