Building Robust and Scalable Multilingual ASR for Indian Languages
By: Arjun Gangwar, Kaousheik Jayakumar, S. Umesh
Potential Business Impact:
Helps computers understand different languages and accents.
This paper describes the systems developed by SPRING Lab, Indian Institute of Technology Madras, for the ASRU MADASR 2.0 challenge. The systems developed focuses on adapting ASR systems to improve in predicting the language and dialect of the utterance among 8 languages across 33 dialects. We participated in Track 1 and Track 2, which restricts the use of additional data and develop from-the-scratch multilingual systems. We presented a novel training approach using Multi-Decoder architecture with phonemic Common Label Set (CLS) as intermediate representation. It improved the performance over the baseline (in the CLS space). We also discuss various methods used to retain the gain obtained in the phonemic space while converting them back to the corresponding grapheme representations. Our systems beat the baseline in 3 languages (Track 2) in terms of WER/CER and achieved the highest language ID and dialect ID accuracy among all participating teams (Track 2).
Similar Papers
The Eloquence team submission for task 1 of MLC-SLM challenge
Sound
Helps computers understand many languages spoken.
Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge
Audio and Speech Processing
Lets computers understand many languages spoken.
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Computation and Language
Lets computers understand over 1,600 languages.