Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition
By: Ran Liu , Fengyu Zhang , Cong Yu and more
Potential Business Impact:
Helps computers understand emotions from faces and voices.
This article presents our results for the eighth Affective Behavior Analysis in-the-wild (ABAW) competition.Multimodal emotion recognition (ER) has important applications in affective computing and human-computer interaction. However, in the real world, compound emotion recognition faces greater issues of uncertainty and modal conflicts. For the Compound Expression (CE) Recognition Challenge,this paper proposes a multimodal emotion recognition method that fuses the features of Vision Transformer (ViT) and Residual Network (ResNet). We conducted experiments on the C-EXPR-DB and MELD datasets. The results show that in scenarios with complex visual and audio cues (such as C-EXPR-DB), the model that fuses the features of ViT and ResNet exhibits superior performance.Our code are avalible on https://github.com/MyGitHub-ax/8th_ABAW
Similar Papers
Interactive Multimodal Fusion with Temporal Modeling
CV and Pattern Recognition
Lets computers guess your feelings from faces and voices.
HSEmotion Team at ABAW-8 Competition: Audiovisual Ambivalence/Hesitancy, Emotional Mimicry Intensity and Facial Expression Recognition
CV and Pattern Recognition
Helps computers understand emotions from faces, voices, and words.
MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network
Machine Learning (CS)
Helps computers understand emotions from faces, voices, and words.