FINE: Factorized multimodal sentiment analysis via mutual INformation Estimation
By: Yadong Liu, Shangfei Wang
Potential Business Impact:
Helps computers understand feelings from text and pictures.
Multimodal sentiment analysis remains a challenging task due to the inherent heterogeneity across modalities. Such heterogeneity often manifests as asynchronous signals, imbalanced information between modalities, and interference from task-irrelevant noise, hindering the learning of robust and accurate sentiment representations. To address these issues, we propose a factorized multimodal fusion framework that first disentangles each modality into shared and unique representations, and then suppresses task-irrelevant noise within both to retain only sentiment-critical representations. This fine-grained decomposition improves representation quality by reducing redundancy, prompting cross-modal complementarity, and isolating task-relevant sentiment cues. Rather than manipulating the feature space directly, we adopt a mutual information-based optimization strategy to guide the factorization process in a more stable and principled manner. To further support feature extraction and long-term temporal modeling, we introduce two auxiliary modules: a Mixture of Q-Formers, placed before factorization, which precedes the factorization and uses learnable queries to extract fine-grained affective features from multiple modalities, and a Dynamic Contrastive Queue, placed after factorization, which stores latest high-level representations for contrastive learning, enabling the model to capture long-range discriminative patterns and improve class-level separability. Extensive experiments on multiple public datasets demonstrate that our method consistently outperforms existing approaches, validating the effectiveness and robustness of the proposed framework.
Similar Papers
Robust Multimodal Sentiment Analysis with Distribution-Based Feature Recovery and Fusion
Computation and Language
Helps computers understand feelings from broken pictures and words.
Multi-Modal Opinion Integration for Financial Sentiment Analysis using Cross-Modal Attention
Machine Learning (CS)
Helps predict stock prices by understanding opinions.
Senti-iFusion: An Integrity-centered Hierarchical Fusion Framework for Multimodal Sentiment Analysis under Uncertain Modality Missingness
Human-Computer Interaction
Helps computers understand feelings even with missing info.