Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition
By: Wen-Jue He, Xiaofeng Zhu, Zheng Zhang
Incomplete multi-modal emotion recognition (IMER) aims at understanding human intentions and sentiments by comprehensively exploring the partially observed multi-source data. Although the multi-modal data is expected to provide more abundant information, the performance gap and modality under-optimization problem hinder effective multi-modal learning in practice, and are exacerbated in the confrontation of the missing data. To address this issue, we devise a novel Cross-modal Prompting (ComP) method, which emphasizes coherent information by enhancing modality-specific features and improves the overall recognition accuracy by boosting each modality's performance. Specifically, a progressive prompt generation module with a dynamic gradient modulator is proposed to produce concise and consistent modality semantic cues. Meanwhile, cross-modal knowledge propagation selectively amplifies the consistent information in modality features with the delivered prompts to enhance the discrimination of the modality-specific output. Additionally, a coordinator is designed to dynamically re-weight the modality outputs as a complement to the balance strategy to improve the model's efficacy. Extensive experiments on 4 datasets with 7 SOTA methods under different missing rates validate the effectiveness of our proposed method.
Similar Papers
ECMF: Enhanced Cross-Modal Fusion for Multimodal Emotion Recognition in MER-SEMI Challenge
CV and Pattern Recognition
Helps computers understand your feelings from faces, voices, words.
Calibrating Multimodal Consensus for Emotion Recognition
CV and Pattern Recognition
Helps computers understand feelings from words and faces.
Hierarchical MoE: Continuous Multimodal Emotion Recognition with Incomplete and Asynchronous Inputs
Human-Computer Interaction
Lets computers understand feelings even when information is missing.