Asymmetric Cross-Modal Knowledge Distillation: Bridging Modalities with Weak Semantic Consistency
By: Riling Wei , Kelu Yao , Chuanguang Yang and more
Potential Business Impact:
Teaches computers to learn from different kinds of pictures.
Cross-modal Knowledge Distillation has demonstrated promising performance on paired modalities with strong semantic connections, referred to as Symmetric Cross-modal Knowledge Distillation (SCKD). However, implementing SCKD becomes exceedingly constrained in real-world scenarios due to the limited availability of paired modalities. To this end, we investigate a general and effective knowledge learning concept under weak semantic consistency, dubbed Asymmetric Cross-modal Knowledge Distillation (ACKD), aiming to bridge modalities with limited semantic overlap. Nevertheless, the shift from strong to weak semantic consistency improves flexibility but exacerbates challenges in knowledge transmission costs, which we rigorously verified based on optimal transport theory. To mitigate the issue, we further propose a framework, namely SemBridge, integrating a Student-Friendly Matching module and a Semantic-aware Knowledge Alignment module. The former leverages self-supervised learning to acquire semantic-based knowledge and provide personalized instruction for each student sample by dynamically selecting the relevant teacher samples. The latter seeks the optimal transport path by employing Lagrangian optimization. To facilitate the research, we curate a benchmark dataset derived from two modalities, namely Multi-Spectral (MS) and asymmetric RGB images, tailored for remote sensing scene classification. Comprehensive experiments exhibit that our framework achieves state-of-the-art performance compared with 7 existing approaches on 6 different model architectures across various datasets.
Similar Papers
View-aware Cross-modal Distillation for Multi-view Action Recognition
CV and Pattern Recognition
Lets computers understand actions from different camera angles.
WeCKD: Weakly-supervised Chained Distillation Network for Efficient Multimodal Medical Imaging
CV and Pattern Recognition
Teaches computers to learn from less data.
Information-Theoretic Criteria for Knowledge Distillation in Multimodal Learning
Machine Learning (CS)
Teaches computers to learn better from different kinds of information.