Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity
By: Hauke Licht
Potential Business Impact:
AI can't reliably tell emotions in real speeches.
Emotions are central to politics and analyzing their role in political communication has a long tradition. As research increasingly leverages audio-visual materials to analyze the display of emotions, the emergence of multimodal generative AI promises great advances. However, we lack evidence about the effectiveness of multimodal AI in emotion analysis. This paper addresses this gap by evaluating current multimodal large language models (mLLMs) in video-based analysis of emotional arousal in two complementary data sets of human-labeled video recordings. I find that under ideal circumstances, mLLMs' emotional arousal ratings are highly reliable and show little to know indication of demographic bias. However, in recordings of speakers in real-world parliamentary debates, mLLMs' arousal ratings fail to deliver on this promise with potential negative consequences for downstream statistical inferences. This study therefore underscores the need for continued, thorough evaluation of emerging generative AI methods in political analysis and contributes a suitable replicable framework.
Similar Papers
Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting
Artificial Intelligence
Helps computers understand feelings from voices, faces, words.
Large Language Models are Highly Aligned with Human Ratings of Emotional Stimuli
Artificial Intelligence
AI understands feelings like people do.
AI shares emotion with humans across languages and cultures
Computation and Language
AI understands and shows feelings like people.