Score: 1

Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning

Published: April 23, 2025 | arXiv ID: 2504.18582v1

By: Abdulhady Abas Abdullah , Sarkhel H. Taher Karim , Sara Azad Ahmed and more

Potential Business Impact:

Helps computers tell who is talking in any language.

Business Areas:

Speech Recognition Data and Analytics, Software

Speaker diarization is a fundamental task in speech processing that involves dividing an audio stream by speaker. Although state-of-the-art models have advanced performance in high-resource languages, low-resource languages such as Kurdish pose unique challenges due to limited annotated data, multiple dialects and frequent code-switching. In this study, we address these issues by training the Wav2Vec 2.0 self-supervised learning model on a dedicated Kurdish corpus. By leveraging transfer learning, we adapted multilingual representations learned from other languages to capture the phonetic and acoustic characteristics of Kurdish speech. Relative to a baseline method, our approach reduced the diarization error rate by seven point two percent and improved cluster purity by thirteen percent. These findings demonstrate that enhancements to existing models can significantly improve diarization performance for under-resourced languages. Our work has practical implications for developing transcription services for Kurdish-language media and for speaker segmentation in multilingual call centers, teleconferencing and video-conferencing systems. The results establish a foundation for building effective diarization systems in other understudied languages, contributing to greater equity in speech technology.

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Computation and Language

Helps computers understand Bangla speech better.

2 Jul 2025 1

88%

Towards stable AI systems for Evaluating Arabic Pronunciations

Computation and Language

Teaches computers to understand Arabic letter sounds.

27 Aug 2025 1

88%

From Dialect Gaps to Identity Maps: Tackling Variability in Speaker Verification

Audio and Speech Processing

Helps computers tell Kurdish speakers apart.

21 Apr 2025 0

View PDF Login to Bookmark

Page Count

19 pages

Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning

Helps computers tell who is talking in any language.

Technical Abstract

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

Towards stable AI systems for Evaluating Arabic Pronunciations

From Dialect Gaps to Identity Maps: Tackling Variability in Speaker Verification