Multi-modal Transfer Learning for Dynamic Facial Emotion Recognition in the Wild
By: Ezra Engel , Lishan Li , Chris Hudy and more
Potential Business Impact:
Helps computers understand emotions from faces better.
Facial expression recognition (FER) is a subset of computer vision with important applications for human-computer-interaction, healthcare, and customer service. FER represents a challenging problem-space because accurate classification requires a model to differentiate between subtle changes in facial features. In this paper, we examine the use of multi-modal transfer learning to improve performance on a challenging video-based FER dataset, Dynamic Facial Expression in-the-Wild (DFEW). Using a combination of pretrained ResNets, OpenPose, and OmniVec networks, we explore the impact of cross-temporal, multi-modal features on classification accuracy. Ultimately, we find that these finely-tuned multi-modal feature generators modestly improve accuracy of our transformer-based classification model.
Similar Papers
InsideOut: An EfficientNetV2-S Based Deep Learning Framework for Robust Multi-Class Facial Emotion Recognition
CV and Pattern Recognition
Helps computers understand emotions better, even when faces are hidden.
Feature Aggregation for Efficient Continual Learning of Complex Facial Expressions
CV and Pattern Recognition
AI learns to read emotions without forgetting.
ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition
CV and Pattern Recognition
Helps computers understand your feelings better.