Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition
By: Yi-Cheng Lin , Yu-Hsuan Li Liang , Hsuan Su and more
Potential Business Impact:
Fixes speech recognition for new accents.
Robust ASR under domain shift is crucial because real-world systems encounter unseen accents and domains with limited labeled data. Although pseudo-labeling offers a practical workaround, it often introduces systematic, accent-specific errors that filtering fails to fix. We ask: How can we correct these recurring biases without target ground truth? We propose a simple parameter-space correction: in a source domain containing both real and pseudo-labeled data, two ASR models are fine-tuned from the same initialization, one on ground-truth labels and the other on pseudo-labels, and their weight difference forms a correction vector that captures pseudo-label biases. When applied to a pseudo-labeled target model, this vector enhances recognition, achieving up to a 35% relative Word Error Rate (WER) reduction on AfriSpeech-200 across ten African accents with the Whisper tiny model.
Similar Papers
Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering
Computation and Language
Makes voice assistants understand tricky words better.
Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
Sound
Helps computers understand non-native English speakers better.
Self-Improvement for Audio Large Language Model using Unlabeled Speech
Sound
Improves voice AI without needing new recordings.