Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning
By: Ömer Tarik Özyilmaz, Matt Coler, Matias Valdenegro-Toro
Potential Business Impact:
Helps computers understand all Arabic accents.
Although commercial Arabic automatic speech recognition (ASR) systems support Modern Standard Arabic (MSA), they struggle with dialectal speech. We investigate the effect of fine-tuning OpenAI's Whisper on five major Arabic dialects (Gulf, Levantine, Iraqi, Egyptian, Maghrebi) using Mozilla Common Voice for MSA and the MASC dataset for dialectal speech. We evaluate MSA training size effects, benefits of pre-training on MSA data, and dialect-specific versus dialect-pooled models. We find that small amounts of MSA fine-tuning data yield substantial improvements for smaller models, matching larger non-fine-tuned models. While MSA pre-training shows minimal benefit, suggesting limited shared features between MSA and dialects, our dialect-pooled models perform comparably to dialect-specific ones. This indicates that pooling dialectal data, when properly balanced, can help address data scarcity in low-resource ASR without significant performance loss.
Similar Papers
Whispering in Amharic: Fine-tuning Whisper for Low-resource Language
Computation and Language
Helps computers understand Amharic speech better.
Munsit at NADI 2025 Shared Task 2: Pushing the Boundaries of Multidialectal Arabic ASR with Weakly Supervised Pretraining and Continual Supervised Fine-tuning
Computation and Language
Helps computers understand many Arabic accents.
Context-Aware Whisper for Arabic ASR Under Linguistic Varieties
Computation and Language
Helps computers understand different Arabic accents better.