Score: 0

Multimodal Video Emotion Recognition with Reliable Reasoning Priors

Published: July 29, 2025 | arXiv ID: 2508.03722v1

By: Zhepeng Wang , Yingjian Zhu , Guanghao Dong and more

Potential Business Impact:

Helps computers understand feelings better.

This study investigates the integration of trustworthy prior reasoning knowledge from MLLMs into multimodal emotion recognition. We employ Gemini to generate fine-grained, modality-separable reasoning traces, which are injected as priors during the fusion stage to enrich cross-modal interactions. To mitigate the pronounced class-imbalance in multimodal emotion recognition, we introduce Balanced Dual-Contrastive Learning, a loss formulation that jointly balances inter-class and intra-class distributions. Applied to the MER2024 benchmark, our prior-enhanced framework yields substantial performance gains, demonstrating that the reliability of MLLM-derived reasoning can be synergistically combined with the domain adaptability of lightweight fusion networks for robust, scalable emotion recognition.

Country of Origin
🇲🇴 Macao

Page Count
12 pages

Category
Computer Science:
CV and Pattern Recognition