Arabic ASR on the SADA Large-Scale Arabic Speech Corpus with Transformer-Based Models
By: Branislav Gerazov, Marcello Politi, Sébastien Bratières
Potential Business Impact:
Helps computers understand different Arabic accents better.
We explore the performance of several state-of-the-art automatic speech recognition (ASR) models on a large-scale Arabic speech dataset, the SADA (Saudi Audio Dataset for Arabic), which contains 668 hours of high-quality audio from Saudi television shows. The dataset includes multiple dialects and environments, specifically a noisy subset that makes it particularly challenging for ASR. We evaluate the performance of the models on the SADA test set, and we explore the impact of fine-tuning, language models, as well as noise and denoising on their performance. We find that the best performing model is the MMS 1B model finetuned on SADA with a 4-gram language model that achieves a WER of 40.9\% and a CER of 17.6\% on the SADA test clean set.
Similar Papers
Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning
Artificial Intelligence
Lets computers understand Arabic speech without human help.
Open Automatic Speech Recognition Models for Classical and Modern Standard Arabic
Computation and Language
Helps computers understand spoken Arabic better.
Doing More with Less: Data Augmentation for Sudanese Dialect Automatic Speech Recognition
Computation and Language
Helps computers understand a rare Arabic language.