Pitch Accent Detection improves Pretrained Automatic Speech Recognition
By: David Sasu, Natalie Schluter
Potential Business Impact:
Helps computers understand spoken words better.
We show the performance of Automatic Speech Recognition (ASR) systems that use semi-supervised speech representations can be boosted by a complimentary pitch accent detection module, by introducing a joint ASR and pitch accent detection model. The pitch accent detection component of our model achieves a significant improvement on the state-of-the-art for the task, closing the gap in F1-score by 41%. Additionally, the ASR performance in joint training decreases WER by 28.3% on LibriSpeech, under limited resource fine-tuning. With these results, we show the importance of extending pretrained speech models to retain or re-learn important prosodic cues such as pitch accent.
Similar Papers
Accent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram Masking
Computation and Language
Makes voice assistants understand all accents better.
Prominence-aware automatic speech recognition for conversational speech
Computation and Language
Helps computers understand what's important in talking.
Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
Sound
Helps computers understand non-native English speakers better.