Explainable Multimodal Regression via Information Decomposition
By: Zhaozhao Ma, Shujian Yu
Multimodal regression aims to predict a continuous target from heterogeneous input sources and typically relies on fusion strategies such as early or late fusion. However, existing methods lack principled tools to disentangle and quantify the individual contributions of each modality and their interactions, limiting the interpretability of multimodal fusion. We propose a novel multimodal regression framework grounded in Partial Information Decomposition (PID), which decomposes modality-specific representations into unique, redundant, and synergistic components. The basic PID framework is inherently underdetermined. To resolve this, we introduce inductive bias by enforcing Gaussianity in the joint distribution of latent representations and the transformed response variable (after inverse normal transformation), thereby enabling analytical computation of the PID terms. Additionally, we derive a closed-form conditional independence regularizer to promote the isolation of unique information within each modality. Experiments on six real-world datasets, including a case study on large-scale brain age prediction from multimodal neuroimaging data, demonstrate that our framework outperforms state-of-the-art methods in both predictive accuracy and interpretability, while also enabling informed modality selection for efficient inference. Implementation is available at https://github.com/zhaozhaoma/PIDReg.
Similar Papers
Quantifying Modality Contributions via Disentangling Multimodal Representations
Machine Learning (CS)
Shows how different AI senses work together.
Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions
Machine Learning (CS)
Helps computers understand mixed information better.
FINE: Factorized multimodal sentiment analysis via mutual INformation Estimation
Multimedia
Helps computers understand feelings from text and pictures.