Score: 0

ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge

Published: August 8, 2025 | arXiv ID: 2508.05991v1

By: Juewen Hu , Yexin Li , Jiulin Li and more

Potential Business Impact:

Helps computers understand your feelings from faces, voices, words.

Emotion recognition plays a vital role in enhancing human-computer interaction. In this study, we tackle the MER-SEMI challenge of the MER2025 competition by proposing a novel multimodal emotion recognition framework. To address the issue of data scarcity, we leverage large-scale pre-trained models to extract informative features from visual, audio, and textual modalities. Specifically, for the visual modality, we design a dual-branch visual encoder that captures both global frame-level features and localized facial representations. For the textual modality, we introduce a context-enriched method that employs large language models to enrich emotional cues within the input text. To effectively integrate these multimodal features, we propose a fusion strategy comprising two key components, i.e., self-attention mechanisms for dynamic modality weighting, and residual connections to preserve original representations. Beyond architectural design, we further refine noisy labels in the training set by a multi-source labeling strategy. Our approach achieves a substantial performance improvement over the official baseline on the MER2025-SEMI dataset, attaining a weighted F-score of 87.49% compared to 78.63%, thereby validating the effectiveness of the proposed framework.

A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models

Computation and Language

**Computers understand feelings from talking, seeing, and writing.**

12 Jan 2026 1

91%

A Novel Approach to for Multimodal Emotion Recognition : Multimodal semantic information fusion

CV and Pattern Recognition

Helps computers understand feelings from faces and voices.

12 Feb 2025 1

91%

Calibrating Multimodal Consensus for Emotion Recognition

CV and Pattern Recognition

Helps computers understand feelings from words and faces.

23 Oct 2025 1

View PDF Login to Bookmark

Page Count

5 pages

ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge

Helps computers understand your feelings from faces, voices, words.

Technical Abstract

A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models

A Novel Approach to for Multimodal Emotion Recognition : Multimodal semantic information fusion

Calibrating Multimodal Consensus for Emotion Recognition