Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
By: Qiongqiong Wang , Hardik B. Sailor , Jeremy H. M. Wong and more
Potential Business Impact:
Teaches computers to understand feelings in voices.
Current large speech language models (Speech-LLMs) often exhibit limitations in empathetic reasoning, primarily due to the absence of training datasets that integrate both contextual content and paralinguistic cues. In this work, we propose two approaches to incorporate contextual paralinguistic information into model training: (1) an explicit method that provides paralinguistic metadata (e.g., emotion annotations) directly to the LLM, and (2) an implicit method that automatically generates novel training question-answer (QA) pairs using both categorical and dimensional emotion annotations alongside speech transcriptions. Our implicit method boosts performance (LLM-judged) by 38.41% on a human-annotated QA benchmark, reaching 46.02% when combined with the explicit approach, showing effectiveness in contextual paralinguistic understanding. We also validate the LLM judge by demonstrating its correlation with classification metrics, providing support for its reliability.
Similar Papers
Dual Information Speech Language Models for Emotional Conversations
Computation and Language
Lets computers understand feelings in spoken words.
Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data
Computation and Language
Helps computers understand feelings in voices.
Leveraging LLMs for Context-Aware Implicit Textual and Multimodal Hate Speech Detection
Computation and Language
Helps computers spot hateful messages better.