Score: 2

Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering

Published: June 4, 2025 | arXiv ID: 2506.03681v1

By: Pradeep Rangappa , Andres Carofilis , Jeena Prakash and more

Potential Business Impact:

Makes voice assistants understand tricky words better.

Business Areas:

Speech Recognition Data and Analytics, Software

Fine-tuning pretrained ASR models for specific domains is challenging for small organizations with limited labeled data and computational resources. Here, we explore different data selection pipelines and propose a robust approach that improves ASR adaptation by filtering pseudo-labels generated using Whisper (encoder-decoder) and Zipformer (transducer) models. Our approach integrates multiple selection strategies -- including word error rate (WER) prediction, named entity recognition (NER), and character error rate (CER) analysis -- to extract high-quality training segments. We evaluate our method on Whisper and Zipformer using a 7500-hour baseline, comparing it to a CER-based approach relying on hypotheses from three ASR systems. Fine-tuning on 7500 hours of pseudo-labeled call center data achieves 12.3% WER, while our filtering reduces the dataset to 100 hours (1.4%) with similar performance; a similar trend is observed on Fisher English.

Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering

Computation and Language

Teaches computers to hear better with less data.

5 Jun 2025 1

89%

Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR

Sound

Helps computers understand non-native English speakers better.

12 Oct 2025 1

87%

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

Audio and Speech Processing

Fixes speech recognition for new accents.

9 Oct 2025 2

View PDF Login to Bookmark

Repos / Data Links

github.com github.com github.com

Page Count

5 pages

Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering

Makes voice assistants understand tricky words better.

Technical Abstract

Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering

Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition