Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data
By: Qiongqiong Wang , Hardik Bhupendra Sailor , Tianchi Liu and more
Potential Business Impact:
Helps computers understand feelings in voices.
Recent speech-LLMs have shown impressive performance in tasks like transcription and translation, yet they remain limited in understanding the paralinguistic aspects of speech crucial for social and emotional intelligence. We propose CP-Bench, a benchmark for evaluating speech-LLMs on contextual paralinguistic reasoning the integration of verbal content with non-verbal cues like emotion and prosody. The benchmark includes two curated question answering (QA) datasets requiring both linguistic and empathetic understanding. We evaluate state-of-the-art speech-LLMs from both open and closed-source models and perform a comprehensive analysis across different question types. The top two models were further analyzed under temperature tuning to understand its effect on this task. Our benchmark reveals a key gap in existing evaluations and offers insights into building more context-aware and emotionally intelligent speech-capable LLMs.
Similar Papers
Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
Computation and Language
Teaches computers to understand feelings in voices.
ParliaBench: An Evaluation and Benchmarking Framework for LLM-Generated Parliamentary Speech
Computation and Language
Helps computers write speeches that sound like real politicians.
Benchmarking Contextual Understanding for In-Car Conversational Systems
Computation and Language
Tests car voice assistants for better answers.