Score: 0

Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models

Published: August 10, 2025 | arXiv ID: 2508.07273v1

By: Qiongqiong Wang , Hardik B. Sailor , Jeremy H. M. Wong and more

Potential Business Impact:

Teaches computers to understand feelings in voices.

Current large speech language models (Speech-LLMs) often exhibit limitations in empathetic reasoning, primarily due to the absence of training datasets that integrate both contextual content and paralinguistic cues. In this work, we propose two approaches to incorporate contextual paralinguistic information into model training: (1) an explicit method that provides paralinguistic metadata (e.g., emotion annotations) directly to the LLM, and (2) an implicit method that automatically generates novel training question-answer (QA) pairs using both categorical and dimensional emotion annotations alongside speech transcriptions. Our implicit method boosts performance (LLM-judged) by 38.41% on a human-annotated QA benchmark, reaching 46.02% when combined with the explicit approach, showing effectiveness in contextual paralinguistic understanding. We also validate the LLM judge by demonstrating its correlation with classification metrics, providing support for its reliability.

Dual Information Speech Language Models for Emotional Conversations

Computation and Language

Lets computers understand feelings in spoken words.

11 Aug 2025 0

93%

Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data

Computation and Language

Helps computers understand feelings in voices.

20 Sep 2025 3

90%

Leveraging LLMs for Context-Aware Implicit Textual and Multimodal Hate Speech Detection

Computation and Language

Helps computers spot hateful messages better.

17 Oct 2025 0

View PDF Login to Bookmark

Page Count

8 pages

Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models

Teaches computers to understand feelings in voices.

Technical Abstract

Dual Information Speech Language Models for Emotional Conversations

Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data

Leveraging LLMs for Context-Aware Implicit Textual and Multimodal Hate Speech Detection