Score: 2

Frustratingly Easy Data Augmentation for Low-Resource ASR

Published: September 18, 2025 | arXiv ID: 2509.15373v2

By: Katsumi Ibaraki, David Chiang

Potential Business Impact:

Makes talking computers understand rare languages better.

Business Areas:
Speech Recognition Data and Analytics, Software

This paper introduces three self-contained data augmentation methods for low-resource Automatic Speech Recognition (ASR). Our techniques first generate novel text--using gloss-based replacement, random replacement, or an LLM-based approach--and then apply Text-to-Speech (TTS) to produce synthetic audio. We apply these methods, which leverage only the original annotated data, to four languages with extremely limited resources (Vatlongos, Nashta, Shinekhen Buryat, and Kakabe). Fine-tuning a pretrained Wav2Vec2-XLSR-53 model on a combination of the original audio and generated synthetic data yields significant performance gains, including a 14.3% absolute WER reduction for Nashta. The methods prove effective across all four low-resource languages and also show utility for high-resource languages like English, demonstrating their broad applicability.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
5 pages

Category
Computer Science:
Computation and Language