MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates
By: Binyu Zhao, Wei Zhang, Zhaonian Zou
Potential Business Impact:
Helps computers learn from mixed-up information.
Multi-modal learning has made significant advances across diverse pattern recognition applications. However, handling missing modalities, especially under imbalanced missing rates, remains a major challenge. This imbalance triggers a vicious cycle: modalities with higher missing rates receive fewer updates, leading to inconsistent learning progress and representational degradation that further diminishes their contribution. Existing methods typically focus on global dataset-level balancing, often overlooking critical sample-level variations in modality utility and the underlying issue of degraded feature quality. We propose Modality Capability Enhancement (MCE) to tackle these limitations. MCE includes two synergistic components: i) Learning Capability Enhancement (LCE), which introduces multi-level factors to dynamically balance modality-specific learning progress, and ii) Representation Capability Enhancement (RCE), which improves feature semantics and robustness through subset prediction and cross-modal completion tasks. Comprehensive evaluations on four multi-modal benchmarks show that MCE consistently outperforms state-of-the-art methods under various missing configurations. The journal preprint version is now available at https://doi.org/10.1016/j.patcog.2025.112591. Our code is available at https://github.com/byzhaoAI/MCE.
Similar Papers
MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates
CV and Pattern Recognition
Fixes AI when some information is missing.
Calibrated Multimodal Representation Learning with Missing Modalities
CV and Pattern Recognition
Helps computers learn from mixed-up, incomplete data.
MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment
CV and Pattern Recognition
Helps computers judge actions even with missing info.