Quantifying Modality Contributions via Disentangling Multimodal Representations
By: Padegal Amit , Omkar Mahesh Kashyap , Namitha Rayasam and more
Potential Business Impact:
Shows how different AI senses work together.
Quantifying modality contributions in multimodal models remains a challenge, as existing approaches conflate the notion of contribution itself. Prior work relies on accuracy-based approaches, interpreting performance drops after removing a modality as indicative of its influence. However, such outcome-driven metrics fail to distinguish whether a modality is inherently informative or whether its value arises only through interaction with other modalities. This distinction is particularly important in cross-attention architectures, where modalities influence each other's representations. In this work, we propose a framework based on Partial Information Decomposition (PID) that quantifies modality contributions by decomposing predictive information in internal embeddings into unique, redundant, and synergistic components. To enable scalable, inference-only analysis, we develop an algorithm based on the Iterative Proportional Fitting Procedure (IPFP) that computes layer and dataset-level contributions without retraining. This provides a principled, representation-level view of multimodal behavior, offering clearer and more interpretable insights than outcome-based metrics.
Similar Papers
FINE: Factorized multimodal sentiment analysis via mutual INformation Estimation
Multimedia
Helps computers understand feelings from text and pictures.
Revisit Modality Imbalance at the Decision Layer
Machine Learning (CS)
Fixes AI that favors one sense over another.
What are You Looking at? Modality Contribution in Multimodal Medical Deep Learning Methods
CV and Pattern Recognition
Shows how AI uses different patient data.