Speech Emotion Recognition via Entropy-Aware Score Selection
By: ChenYi Chua , JunKai Wong , Chengxin Chen and more
Potential Business Impact:
Helps computers understand how people feel from talking.
In this paper, we propose a multimodal framework for speech emotion recognition that leverages entropy-aware score selection to combine speech and textual predictions. The proposed method integrates a primary pipeline that consists of an acoustic model based on wav2vec2.0 and a secondary pipeline that consists of a sentiment analysis model using RoBERTa-XLM, with transcriptions generated via Whisper-large-v3. We propose a late score fusion approach based on entropy and varentropy thresholds to overcome the confidence constraints of primary pipeline predictions. A sentiment mapping strategy translates three sentiment categories into four target emotion classes, enabling coherent integration of multimodal predictions. The results on the IEMOCAP and MSP-IMPROV datasets show that the proposed method offers a practical and reliable enhancement over traditional single-modality systems.
Similar Papers
Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
Sound
Helps computers understand emotions in group talks.
Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts
CV and Pattern Recognition
Lets computers understand feelings from talking, faces, and videos.
Enhancing Speech Emotion Recognition with Graph-Based Multimodal Fusion and Prosodic Features for the Speech Emotion Recognition in Naturalistic Conditions Challenge at Interspeech 2025
Sound
Helps computers understand emotions in spoken words.