Score: 3

Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition

Published: January 27, 2026 | arXiv ID: 2601.19451v1

By: Isha Pandey , Ashish Mittal , Vartul Bahuguna and more

BigTech Affiliations: IBM

Potential Business Impact:

Helps computers understand many languages spoken.

Business Areas:
Semantic Search Internet Services

Recent advances in LLM-based ASR connect frozen speech encoders with Large Language Models (LLMs) via lightweight projectors. While effective in monolingual settings, a single projector struggles to capture the diverse acoustic-to-semantic mappings required for multilingual ASR. To address this, we propose SMEAR-MoE, a stabilized Mixture-of-Experts projector that ensures dense gradient flow to all experts, preventing expert collapse while enabling cross-lingual sharing. We systematically compare monolithic, static multi-projector, and dynamic MoE designs across four Indic languages (Hindi, Marathi, Tamil, Telugu). Our SMEAR-MoE achieves strong performance, delivering upto a 7.6% relative WER reduction over the single-projector baseline, while maintaining comparable runtime efficiency. Analysis of expert routing further shows linguistically meaningful specialization, with related languages sharing experts. These results demonstrate that stable multi-expert projectors are key to scalable and robust multilingual ASR.

Country of Origin
🇮🇳 🇺🇸 United States, India

Repos / Data Links

Page Count
5 pages

Category
Computer Science:
Computation and Language