Continual Speech Learning with Fused Speech Features
By: Guitao Wang , Jinming Zhao , Hao Yang and more
Potential Business Impact:
Lets computers learn new speech tasks faster.
Rapid growth in speech data demands adaptive models, as traditional static methods fail to keep pace with dynamic and diverse speech information. We introduce continuous speech learning, a new set-up targeting at bridging the adaptation gap in current speech models. We use the encoder-decoder Whisper model to standardize speech tasks into a generative format. We integrate a learnable gated-fusion layer on the top of the encoder to dynamically select task-specific features for downstream tasks. Our approach improves accuracy significantly over traditional methods in six speech processing tasks, demonstrating gains in adapting to new speech tasks without full retraining.
Similar Papers
VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion
Computation and Language
Lets computers understand and talk like humans.
FUSE: Universal Speech Enhancement using Multi-Stage Fusion of Sparse Compression and Token Generation Models for the URGENT 2025 Challenge
Sound
Cleans up messy sounds to make voices clear.
MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition
Computation and Language
Lets computers understand speech as it's spoken.