Score: 2

Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages

Published: August 7, 2025 | arXiv ID: 2508.05149v1

By: Seraphina Fong, Marco Matassoni, Alessio Brutti

Potential Business Impact:

Helps computers understand quiet or rare languages.

Large language models (LLMs) have demonstrated potential in handling spoken inputs for high-resource languages, reaching state-of-the-art performance in various tasks. However, their applicability is still less explored in low-resource settings. This work investigates the use of Speech LLMs for low-resource Automatic Speech Recognition using the SLAM-ASR framework, where a trainable lightweight projector connects a speech encoder and a LLM. Firstly, we assess training data volume requirements to match Whisper-only performance, re-emphasizing the challenges of limited data. Secondly, we show that leveraging mono- or multilingual projectors pretrained on high-resource languages reduces the impact of data scarcity, especially with small training sets. Using multilingual LLMs (EuroLLM, Salamandra) with whisper-large-v3-turbo, we evaluate performance on several public benchmarks, providing insights for future research on optimizing Speech LLMs for low-resource languages and multilinguality.

SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings

Computation and Language

Lets computers understand spoken words for tasks.

29 Aug 2025 0

91%

Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages

Sound

Helps computers understand Thai speech better.

18 Sep 2025 2

89%

Efficient Scaling for LLM-based ASR

Sound

Boosts speech-to-text accuracy with half the power

6 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇹 Italy

Repos / Data Links

github.com github.com github.com

Page Count

5 pages

Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages

Helps computers understand quiet or rare languages.

Technical Abstract

SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings

Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages

Efficient Scaling for LLM-based ASR