More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment
By: Jun Xie , Yingjian Zhu , Feng Chen and more
Potential Business Impact:
Helps computers understand emotions better from faces.
In this paper, we present our solution for the semi-supervised learning track (MER-SEMI) in MER2025. We propose a comprehensive framework, grounded in the principle that "more is better," to construct a robust Mixture of Experts (MoE) emotion recognition system. Our approach integrates a diverse range of input modalities as independent experts, including novel signals such as knowledge from large Vision-Language Models (VLMs) and temporal Action Unit (AU) information. To effectively utilize unlabeled data, we introduce a consensus-based pseudo-labeling strategy, generating high-quality labels from the agreement between a baseline model and Gemini, which are then used in a two-stage training paradigm. Finally, we employ a multi-expert voting ensemble combined with a rule-based re-ranking process to correct prediction bias and better align the outputs with human preferences. Evaluated on the MER2025-SEMI challenge dataset, our method achieves an F1-score of 0.8772 on the test set, ranking 2nd in the track. Our code is available at https://github.com/zhuyjan/MER2025-MRAC25.
Similar Papers
ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge
CV and Pattern Recognition
Helps computers understand your feelings from faces, voices, words.
Hierarchical MoE: Continuous Multimodal Emotion Recognition with Incomplete and Asynchronous Inputs
Human-Computer Interaction
Lets computers understand feelings even when information is missing.
Hierarchical MoE: Continuous Multimodal Emotion Recognition with Incomplete and Asynchronous Inputs
Human-Computer Interaction
Helps computers understand feelings even when data is missing.