Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models
By: Parismita Gogoi , Sishir Kalita , Wendy Lalhminghlui and more
Potential Business Impact:
Helps computers understand spoken words in rare languages.
This study explores the use of self-supervised learning (SSL) models for tone recognition in three low-resource languages from North Eastern India: Angami, Ao, and Mizo. We evaluate four Wav2vec2.0 base models that were pre-trained on both tonal and non-tonal languages. We analyze tone-wise performance across the layers for all three languages and compare the different models. Our results show that tone recognition works best for Mizo and worst for Angami. The middle layers of the SSL models are the most important for tone recognition, regardless of the pre-training language, i.e. tonal or non-tonal. We have also found that the tone inventory, tone types, and dialectal variations affect tone recognition. These findings provide useful insights into the strengths and weaknesses of SSL-based embeddings for tonal languages and highlight the potential for improving tone recognition in low-resource settings. The source code is available at GitHub 1 .
Similar Papers
How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Audio and Speech Processing
Helps computers understand talking with different tones.
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
Audio and Speech Processing
Makes computer voices sound more human-like.
Analyzing the relationships between pretraining language, phonetic, tonal, and speaker information in self-supervised speech models
Computation and Language
Computer models learn languages better, even new ones.