Multimodal Video Emotion Recognition with Reliable Reasoning Priors
By: Zhepeng Wang , Yingjian Zhu , Guanghao Dong and more
Potential Business Impact:
Helps computers understand feelings better.
This study investigates the integration of trustworthy prior reasoning knowledge from MLLMs into multimodal emotion recognition. We employ Gemini to generate fine-grained, modality-separable reasoning traces, which are injected as priors during the fusion stage to enrich cross-modal interactions. To mitigate the pronounced class-imbalance in multimodal emotion recognition, we introduce Balanced Dual-Contrastive Learning, a loss formulation that jointly balances inter-class and intra-class distributions. Applied to the MER2024 benchmark, our prior-enhanced framework yields substantial performance gains, demonstrating that the reliability of MLLM-derived reasoning can be synergistically combined with the domain adaptability of lightweight fusion networks for robust, scalable emotion recognition.
Similar Papers
Beyond Emotion Recognition: A Multi-Turn Multimodal Emotion Understanding and Reasoning Benchmark
CV and Pattern Recognition
Helps computers understand feelings and why.
ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge
CV and Pattern Recognition
Helps computers understand your feelings from faces, voices, words.
Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier
Artificial Intelligence
Makes computers understand feelings and explain them.