MMED: A Multimodal Micro-Expression Dataset based on Audio-Visual Fusion
By: Junbo Wang , Yan Zhao , Shuo Li and more
Potential Business Impact:
Lets computers "hear" hidden feelings in voices.
Micro-expressions (MEs) are crucial leakages of concealed emotion, yet their study has been constrained by a reliance on silent, visual-only data. To solve this issue, we introduce two principal contributions. First, MMED, to our knowledge, is the first dataset capturing the spontaneous vocal cues that co-occur with MEs in ecologically valid, high-stakes interactions. Second, the Asymmetric Multimodal Fusion Network (AMF-Net) is a novel method that effectively fuses a global visual summary with a dynamic audio sequence via an asymmetric cross-attention framework. Rigorous Leave-One-Subject-Out Cross-Validation (LOSO-CV) experiments validate our approach, providing conclusive evidence that audio offers critical, disambiguating information for ME analysis. Collectively, the MMED dataset and our AMF-Net method provide valuable resources and a validated analytical approach for micro-expression recognition.
Similar Papers
MMME: A Spontaneous Multi-Modal Micro-Expression Dataset Enabling Visual-Physiological Fusion
CV and Pattern Recognition
Reads hidden emotions using faces and body signals.
Improving Micro-Expression Recognition with Phase-Aware Temporal Augmentation
CV and Pattern Recognition
Helps computers spot hidden emotions on faces.
Enhancing Meme Emotion Understanding with Multi-Level Modality Enhancement and Dual-Stage Modal Fusion
Computation and Language
Helps computers understand emotions in internet pictures.