Analyzing Image Beyond Visual Aspect: Image Emotion Classification via Multiple-Affective Captioning
By: Zibo Zhou , Zhengjun Zhai , Huimin Chen and more
Potential Business Impact:
Lets computers understand feelings in pictures.
Image emotion classification (IEC) is a longstanding research field that has received increasing attention with the rapid progress of deep learning. Although recent advances have leveraged the knowledge encoded in pre-trained visual models, their effectiveness is constrained by the "affective gap" , limits the applicability of pre-training knowledge for IEC tasks. It has been demonstrated in psychology that language exhibits high variability, encompasses diverse and abundant information, and can effectively eliminate the "affective gap". Inspired by this, we propose a novel Affective Captioning for Image Emotion Classification (ACIEC) to classify image emotion based on pure texts, which effectively capture the affective information in the image. In our method, a hierarchical multi-level contrastive loss is designed for detecting emotional concepts from images, while an emotional attribute chain-of-thought reasoning is proposed to generate affective sentences. Then, a pre-trained language model is leveraged to synthesize emotional concepts and affective sentences to conduct IEC. Additionally, a contrastive loss based on semantic similarity sampling is designed to solve the problem of large intra-class differences and small inter-class differences in affective datasets. Moreover, we also take the images with embedded texts into consideration, which were ignored by previous studies. Extensive experiments illustrate that our method can effectively bridge the affective gap and achieve superior results on multiple benchmarks.
Similar Papers
CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation
CV and Pattern Recognition
Creates pictures that show feelings, not just objects.
EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model
CV and Pattern Recognition
Creates pictures showing exact feelings from words.
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs
CV and Pattern Recognition
Helps computers understand feelings from pictures better.