Score: 4

Representation Decomposition for Learning Similarity and Contrastness Across Modalities for Affective Computing

Published: June 8, 2025 | arXiv ID: 2506.07086v1

By: Yuanhe Tian , Pengsen Cheng , Guoqing Jin and more

BigTech Affiliations: University of Washington

Potential Business Impact:

Helps computers understand feelings from pictures and words.

Business Areas:

Image Recognition Data and Analytics, Software

Multi-modal affective computing aims to automatically recognize and interpret human attitudes from diverse data sources such as images and text, thereby enhancing human-computer interaction and emotion understanding. Existing approaches typically rely on unimodal analysis or straightforward fusion of cross-modal information that fail to capture complex and conflicting evidence presented across different modalities. In this paper, we propose a novel LLM-based approach for affective computing that explicitly deconstructs visual and textual representations into shared (modality-invariant) and modality-specific components. Specifically, our approach firstly encodes and aligns input modalities using pre-trained multi-modal encoders, then employs a representation decomposition framework to separate common emotional content from unique cues, and finally integrates these decomposed signals via an attention mechanism to form a dynamic soft prompt for a multi-modal LLM. Extensive experiments on three representative tasks for affective computing, namely, multi-modal aspect-based sentiment analysis, multi-modal emotion analysis, and hateful meme detection, demonstrate the effectiveness of our approach, which consistently outperforms strong baselines and state-of-the-art models.

Rethinking Multimodal Sentiment Analysis: A High-Accuracy, Simplified Fusion Architecture

Computation and Language

Helps computers understand feelings from talking, seeing, and hearing.

5 May 2025 0

90%

Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting

Artificial Intelligence

Helps computers understand feelings from voices, faces, words.

4 Aug 2025 2

89%

Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity

Computation and Language

AI can't reliably tell emotions in real speeches.

11 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 🇨🇳 China, United States

Repos / Data Links

github.com

Page Count

13 pages

Representation Decomposition for Learning Similarity and Contrastness Across Modalities for Affective Computing

Helps computers understand feelings from pictures and words.

Technical Abstract

Rethinking Multimodal Sentiment Analysis: A High-Accuracy, Simplified Fusion Architecture

Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting

Computational emotion analysis with multimodal LLMs: Current evidence on an emerging methodological opportunity