MotivNet: Evolving Meta-Sapiens into an Emotionally Intelligent Foundation Model
By: Rahul Medicharla, Alper Yilmaz
In this paper, we introduce MotivNet, a generalizable facial emotion recognition model for robust real-world application. Current state-of-the-art FER models tend to have weak generalization when tested on diverse data, leading to deteriorated performance in the real world and hindering FER as a research domain. Though researchers have proposed complex architectures to address this generalization issue, they require training cross-domain to obtain generalizable results, which is inherently contradictory for real-world application. Our model, MotivNet, achieves competitive performance across datasets without cross-domain training by using Meta-Sapiens as a backbone. Sapiens is a human vision foundational model with state-of-the-art generalization in the real world through large-scale pretraining of a Masked Autoencoder. We propose MotivNet as an additional downstream task for Sapiens and define three criteria to evaluate MotivNet's viability as a Sapiens task: benchmark performance, model similarity, and data similarity. Throughout this paper, we describe the components of MotivNet, our training approach, and our results showing MotivNet is generalizable across domains. We demonstrate that MotivNet can be benchmarked against existing SOTA models and meets the listed criteria, validating MotivNet as a Sapiens downstream task, and making FER more incentivizing for in-the-wild application. The code is available at https://github.com/OSUPCVLab/EmotionFromFaceImages.
Similar Papers
ExpressNet-MoE: A Hybrid Deep Neural Network for Emotion Recognition
CV and Pattern Recognition
Helps computers understand your feelings better.
EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition
CV and Pattern Recognition
Teaches computers to understand many more feelings.
Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding
CV and Pattern Recognition
Reads your thoughts to make a face move.