Disentangling Bias by Modeling Intra- and Inter-modal Causal Attention for Multimodal Sentiment Analysis
By: Menghua Jiang , Yuxia Lin , Baoliang Chen and more
Potential Business Impact:
Helps computers understand feelings better, not just tricks.
Multimodal sentiment analysis (MSA) aims to understand human emotions by integrating information from multiple modalities, such as text, audio, and visual data. However, existing methods often suffer from spurious correlations both within and across modalities, leading models to rely on statistical shortcuts rather than true causal relationships, thereby undermining generalization. To mitigate this issue, we propose a Multi-relational Multimodal Causal Intervention (MMCI) model, which leverages the backdoor adjustment from causal theory to address the confounding effects of such shortcuts. Specifically, we first model the multimodal inputs as a multi-relational graph to explicitly capture intra- and inter-modal dependencies. Then, we apply an attention mechanism to separately estimate and disentangle the causal features and shortcut features corresponding to these intra- and inter-modal relations. Finally, by applying the backdoor adjustment, we stratify the shortcut features and dynamically combine them with the causal features to encourage MMCI to produce stable predictions under distribution shifts. Extensive experiments on several standard MSA datasets and out-of-distribution (OOD) test sets demonstrate that our method effectively suppresses biases and improves performance.
Similar Papers
Graph-based Interaction Augmentation Network for Robust Multimodal Sentiment Analysis
Multimedia
Helps computers understand feelings from messy videos.
Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection
CV and Pattern Recognition
Makes computers understand feelings from videos better.
Structures Meet Semantics: Multimodal Fusion via Graph Contrastive Learning
CV and Pattern Recognition
Helps computers understand feelings from voice, face, and words.