SSVD: Structured SVD for Parameter-Efficient Fine-Tuning and Benchmarking under Domain Shift in ASR
By: Pu Wang, Shinji Watanabe, Hugo Van hamme
Potential Business Impact:
Improves voice recognition for different speakers.
Parameter-efficient fine-tuning (PEFT) has emerged as a scalable solution for adapting large foundation models. While low-rank adaptation (LoRA) is widely used in speech applications, its state-of-the-art variants, e.g., VeRA, DoRA, PiSSA, and SVFT, are developed mainly for language and vision tasks, with limited validation in speech. This work presents the first comprehensive integration and benchmarking of these PEFT methods within ESPnet. We further introduce structured SVD-guided (SSVD) fine-tuning, which selectively rotates input-associated right singular vectors while keeping output-associated vectors fixed to preserve semantic mappings. This design enables robust domain adaptation with minimal trainable parameters and improved efficiency. We evaluate all methods on domain-shifted speech recognition tasks, including child speech and dialectal variation, across model scales from 0.1B to 2B. All implementations are released in ESPnet to support reproducibility and future work.
Similar Papers
Spectral-Aware Low-Rank Adaptation for Speaker Verification
Audio and Speech Processing
Improves AI learning by focusing on important data.
SALT: Parameter-Efficient Fine-Tuning via Singular Value Adaptation with Low-Rank Transformation
Image and Video Processing
Helps doctors find sickness in scans better.
Singular Value Decomposition on Kronecker Adaptation for Large Language Model
Machine Learning (CS)
Makes big computer brains learn faster, cheaper.