On the effectiveness of multimodal privileged knowledge distillation in two vision transformer based diagnostic applications
By: Simon Baur , Alexandra Benova , Emilio Dolgener Cantú and more
Potential Business Impact:
Teaches AI to see better using extra info.
Deploying deep learning models in clinical practice often requires leveraging multiple data modalities, such as images, text, and structured data, to achieve robust and trustworthy decisions. However, not all modalities are always available at inference time. In this work, we propose multimodal privileged knowledge distillation (MMPKD), a training strategy that utilizes additional modalities available solely during training to guide a unimodal vision model. Specifically, we used a text-based teacher model for chest radiographs (MIMIC-CXR) and a tabular metadata-based teacher model for mammography (CBIS-DDSM) to distill knowledge into a vision transformer student model. We show that MMPKD can improve the resulting attention maps' zero-shot capabilities of localizing ROI in input images, while this effect does not generalize across domains, as contrarily suggested by prior research.
Similar Papers
AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models
CV and Pattern Recognition
Makes phone apps understand pictures and words better.
Enriching Knowledge Distillation with Cross-Modal Teacher Fusion
CV and Pattern Recognition
Teaches computers to learn better from many sources.
Multi-modal Knowledge Decomposition based Online Distillation for Biomarker Prediction in Breast Cancer Histopathology
CV and Pattern Recognition
Helps doctors predict cancer from simple tissue pictures.