Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
By: Omnilingual ASR team , Gil Keren , Artyom Kozhevnikov and more
Potential Business Impact:
Lets computers understand over 1,600 languages.
Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive and diverse training corpus; by combining breadth of coverage with linguistic variety, the model learns representations robust enough to adapt to unseen languages. Incorporating public resources with community-sourced recordings gathered through compensated local partnerships, Omnilingual ASR expands coverage to over 1,600 languages, the largest such effort to date--including over 500 never before served by ASR. Automatic evaluations show substantial gains over prior systems, especially in low-resource conditions, and strong generalization. We release Omnilingual ASR as a family of models, from 300M variants for low-power devices to 7B for maximum accuracy. We reflect on the ethical considerations shaping this design and conclude by discussing its societal impact. In particular, we highlight how open-sourcing models and tools can lower barriers for researchers and communities, inviting new forms of participation. Open-source artifacts are available at https://github.com/facebookresearch/omnilingual-asr.
Similar Papers
Efficient ASR for Low-Resource Languages: Leveraging Cross-Lingual Unlabeled Data
Computation and Language
Lets computers understand rare languages better.
Building Robust and Scalable Multilingual ASR for Indian Languages
Computation and Language
Helps computers understand different languages and accents.
How I Built ASR for Endangered Languages with a Spoken Dictionary
Computation and Language
Helps save dying languages with less speech data.