Score: 1

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Published: July 2, 2025 | arXiv ID: 2507.01931v1

By: Md Sazzadul Islam Ridoy, Sumi Akter, Md. Aminur Rahman

Potential Business Impact:

Helps computers understand Bangla speech better.

Business Areas:

Speech Recognition Data and Analytics, Software

In recent years, neural models trained on large multilingual text and speech datasets have shown great potential for supporting low-resource languages. This study investigates the performances of two state-of-the-art Automatic Speech Recognition (ASR) models, OpenAI's Whisper (Small & Large-V2) and Facebook's Wav2Vec-BERT on Bangla, a low-resource language. We have conducted experiments using two publicly available datasets: Mozilla Common Voice-17 and OpenSLR to evaluate model performances. Through systematic fine-tuning and hyperparameter optimization, including learning rate, epochs, and model checkpoint selection, we have compared the models based on Word Error Rate (WER), Character Error Rate (CER), Training Time, and Computational Efficiency. The Wav2Vec-BERT model outperformed Whisper across all key evaluation metrics, demonstrated superior performance while requiring fewer computational resources, and offered valuable insights to develop robust speech recognition systems in low-resource linguistic settings.

Benchmarking Automatic Speech Recognition Models for African Languages

Computation and Language

Helps computers understand many African languages.

30 Nov 2025 1

90%

Assessing the Feasibility of Lightweight Whisper Models for Low-Resource Urdu Transcription

Computation and Language

Helps computers understand Urdu speech better.

13 Aug 2025 1

90%

Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability

Computation and Language

Helps computers understand different ways people speak.

3 Jul 2025 2

View PDF Login to Bookmark

Page Count

6 pages

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Helps computers understand Bangla speech better.

Technical Abstract

Benchmarking Automatic Speech Recognition Models for African Languages

Assessing the Feasibility of Lightweight Whisper Models for Low-Resource Urdu Transcription

Benchmarking Akan ASR Models Across Domain-Specific Datasets: A Comparative Evaluation of Performance, Scalability, and Adaptability