Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning
By: Ziyi Liang, Annie Qu, Babak Shahbaba
Potential Business Impact:
Combines different data to make better predictions.
Developing effective multimodal data fusion strategies has become increasingly essential for improving the predictive power of statistical machine learning methods across a wide range of applications, from autonomous driving to medical diagnosis. Traditional fusion methods, including early, intermediate, and late fusion, integrate data at different stages, each offering distinct advantages and limitations. In this paper, we introduce Meta Fusion, a flexible and principled framework that unifies these existing strategies as special cases. Motivated by deep mutual learning and ensemble learning, Meta Fusion constructs a cohort of models based on various combinations of latent representations across modalities, and further boosts predictive performance through soft information sharing within the cohort. Our approach is model-agnostic in learning the latent representations, allowing it to flexibly adapt to the unique characteristics of each modality. Theoretically, our soft information sharing mechanism reduces the generalization error. Empirically, Meta Fusion consistently outperforms conventional fusion strategies in extensive simulation studies. We further validate our approach on real-world applications, including Alzheimer's disease detection and neural decoding.
Similar Papers
Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
CV and Pattern Recognition
Makes video recommendations better by understanding different parts.
Exploring Fusion Strategies for Multimodal Vision-Language Systems
Machine Learning (CS)
Makes AI faster by mixing pictures and words early.
A review on data fusion in multimodal learning analytics and educational data mining
Computers and Society
Helps computers understand how students learn best.