Score: 0

SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings

Published: August 29, 2025 | arXiv ID: 2509.04473v1

By: Jaekwon Yoo , Kunal Chandiramani , Divya Tadimeti and more

Potential Business Impact:

Lets computers understand spoken words for tasks.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

While integrating speech encoder with LLM requires substantial data and resources, use cases face limitations due to insufficient availability. To address this, we propose a solution with a parameter-efficient adapter that converts speech embeddings into LLM-compatible tokens, focusing on end-to-end automatic speech recognition (ASR), named entity recognition (NER), and sentiment analysis (SA). To reduce labeling costs, we employ an LLM-based synthetic dataset annotation technique. The proposed adapter, using 7x fewer trainable parameters, achieves significant performance gains: a 26% relative Word Error Rates (WER) improvement on the LibriSpeech ASR task, a 6.3% relative F1 score increase on the NER task, and a 32% relative F1 score boost on the SA task. Moreover, using advanced techniques such as adding a classifier regularizer and optimizing the LLM with Low-Rank Adaptation (LoRA) yields notable performance gains, with Spoken Language Understanding Evaluation (SLUE) score improvement of 6.6% and 9.5%

Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages

Audio and Speech Processing

Helps computers understand quiet or rare languages.

7 Aug 2025 2

91%

Customizing Speech Recognition Model with Large Language Model Feedback

Computation and Language

Helps computers understand rare words in speech.

5 Jun 2025 1

90%

Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages

Sound

Helps computers understand Thai speech better.

18 Sep 2025 2

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

7 pages

SpeechLLM: Unified Speech and Language Model for Enhanced Multi-Task Understanding in Low Resource Settings

Lets computers understand spoken words for tasks.

Technical Abstract

Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages

Customizing Speech Recognition Model with Large Language Model Feedback

Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages