Munsit at NADI 2025 Shared Task 2: Pushing the Boundaries of Multidialectal Arabic ASR with Weakly Supervised Pretraining and Continual Supervised Fine-tuning
By: Mahmoud Salhab , Shameed Sait , Mohammad Abusheikh and more
Potential Business Impact:
Helps computers understand many Arabic accents.
Automatic speech recognition (ASR) plays a vital role in enabling natural human-machine interaction across applications such as virtual assistants, industrial automation, customer support, and real-time transcription. However, developing accurate ASR systems for low-resource languages like Arabic remains a significant challenge due to limited labeled data and the linguistic complexity introduced by diverse dialects. In this work, we present a scalable training pipeline that combines weakly supervised learning with supervised fine-tuning to develop a robust Arabic ASR model. In the first stage, we pretrain the model on 15,000 hours of weakly labeled speech covering both Modern Standard Arabic (MSA) and various Dialectal Arabic (DA) variants. In the subsequent stage, we perform continual supervised fine-tuning using a mixture of filtered weakly labeled data and a small, high-quality annotated dataset. Our approach achieves state-of-the-art results, ranking first in the multi-dialectal Arabic ASR challenge. These findings highlight the effectiveness of weak supervision paired with fine-tuning in overcoming data scarcity and delivering high-quality ASR for low-resource, dialect-rich languages.
Similar Papers
Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning
Artificial Intelligence
Lets computers understand Arabic speech without human help.
Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning
Computation and Language
Helps computers understand all Arabic accents.
Efficient ASR for Low-Resource Languages: Leveraging Cross-Lingual Unlabeled Data
Computation and Language
Lets computers understand rare languages better.