Score: 2

Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting

Published: August 4, 2025 | arXiv ID: 2508.02429v1

By: Miaosen Luo , Jiesen Long , Zequn Li and more

Potential Business Impact:

Helps computers understand feelings from voices, faces, words.

Multimodal Affective Computing (MAC) aims to recognize and interpret human emotions by integrating information from diverse modalities such as text, video, and audio. Recent advancements in Multimodal Large Language Models (MLLMs) have significantly reshaped the landscape of MAC by offering a unified framework for processing and aligning cross-modal information. However, practical challenges remain, including performance variability across complex MAC tasks and insufficient understanding of how architectural designs and data characteristics impact affective analysis. To address these gaps, we conduct a systematic benchmark evaluation of state-of-the-art open-source MLLMs capable of concurrently processing audio, visual, and textual modalities across multiple established MAC datasets. Our evaluation not only compares the performance of these MLLMs but also provides actionable insights into model optimization by analyzing the influence of model architectures and dataset properties. Furthermore, we propose a novel hybrid strategy that combines generative knowledge prompting with supervised fine-tuning to enhance MLLMs' affective computing capabilities. Experimental results demonstrate that this integrated approach significantly improves performance across various MAC tasks, offering a promising avenue for future research and development in this field. Our code is released on https://github.com/LuoMSen/MLLM-MAC.

Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity

Computation and Language

AI can't reliably tell emotions in real speeches.

11 Dec 2025 0

90%

MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding

Computation and Language

Tests AI's science smarts on journal covers.

14 Aug 2025 1

90%

A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models

Computation and Language

**Computers understand feelings from talking, seeing, and writing.**

12 Jan 2026 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

10 pages

Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting

Helps computers understand feelings from voices, faces, words.

Technical Abstract

Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity

MAC: A Live Benchmark for Multimodal Large Language Models in Scientific Understanding

A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models