Compound Expression Recognition via Large Vision-Language Models
By: Jun Yu, Xilong Lu
Potential Business Impact:
Helps computers understand emotions from faces.
Compound Expression Recognition (CER) is crucial for understanding human emotions and improving human-computer interaction. However, CER faces challenges due to the complexity of facial expressions and the difficulty of capturing subtle emotional cues. To address these issues, we propose a novel approach leveraging Large Vision-Language Models (LVLMs). Our method employs a two-stage fine-tuning process: first, pre-trained LVLMs are fine-tuned on basic facial expressions to establish foundational patterns; second, the model is further optimized on a compound-expression dataset to refine visual-language feature interactions. Our approach achieves advanced accuracy on the RAF-DB dataset and demonstrates strong zero-shot generalization on the C-EXPR-DB dataset, showcasing its potential for real-world applications in emotion analysis and human-computer interaction.
Similar Papers
Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition
CV and Pattern Recognition
Computer understands your face's feelings better.
An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images
CV and Pattern Recognition
Lets computers understand faces without prior training.
Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition
CV and Pattern Recognition
Helps computers understand emotions from faces and voices.