MER-CLIP: AU-Guided Vision-Language Alignment for Micro-Expression Recognition
By: Shifeng Liu , Xinglong Mao , Sirui Zhao and more
Potential Business Impact:
Helps computers spot hidden emotions on faces.
As a critical psychological stress response, micro-expressions (MEs) are fleeting and subtle facial movements revealing genuine emotions. Automatic ME recognition (MER) holds valuable applications in fields such as criminal investigation and psychological diagnosis. The Facial Action Coding System (FACS) encodes expressions by identifying activations of specific facial action units (AUs), serving as a key reference for ME analysis. However, current MER methods typically limit AU utilization to defining regions of interest (ROIs) or relying on specific prior knowledge, often resulting in limited performance and poor generalization. To address this, we integrate the CLIP model's powerful cross-modal semantic alignment capability into MER and propose a novel approach namely MER-CLIP. Specifically, we convert AU labels into detailed textual descriptions of facial muscle movements, guiding fine-grained spatiotemporal ME learning by aligning visual dynamics and textual AU-based representations. Additionally, we introduce an Emotion Inference Module to capture the nuanced relationships between ME patterns and emotions with higher-level semantic understanding. To mitigate overfitting caused by the scarcity of ME data, we put forward LocalStaticFaceMix, an effective data augmentation strategy blending facial images to enhance facial diversity while preserving critical ME features. Finally, comprehensive experiments on four benchmark ME datasets confirm the superiority of MER-CLIP. Notably, UF1 scores on CAS(ME)3 reach 0.7832, 0.6544, and 0.4997 for 3-, 4-, and 7-class classification tasks, significantly outperforming previous methods.
Similar Papers
Improving Micro-Expression Recognition with Phase-Aware Temporal Augmentation
CV and Pattern Recognition
Helps computers spot hidden emotions on faces.
Micro-Expression Recognition via Fine-Grained Dynamic Perception
CV and Pattern Recognition
Helps computers understand tiny, fast facial changes.
MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception
CV and Pattern Recognition
Helps computers spot tiny, hidden facial emotions.