Multimodal Medical Endoscopic Image Analysis via Progressive Disentangle-aware Contrastive Learning
By: Junhao Wu , Yun Li , Junhao Li and more
Potential Business Impact:
Helps doctors find throat cancer better.
Accurate segmentation of laryngo-pharyngeal tumors is crucial for precise diagnosis and effective treatment planning. However, traditional single-modality imaging methods often fall short of capturing the complex anatomical and pathological features of these tumors. In this study, we present an innovative multi-modality representation learning framework based on the `Align-Disentangle-Fusion' mechanism that seamlessly integrates 2D White Light Imaging (WLI) and Narrow Band Imaging (NBI) pairs to enhance segmentation performance. A cornerstone of our approach is multi-scale distribution alignment, which mitigates modality discrepancies by aligning features across multiple transformer layers. Furthermore, a progressive feature disentanglement strategy is developed with the designed preliminary disentanglement and disentangle-aware contrastive learning to effectively separate modality-specific and shared features, enabling robust multimodal contrastive learning and efficient semantic fusion. Comprehensive experiments on multiple datasets demonstrate that our method consistently outperforms state-of-the-art approaches, achieving superior accuracy across diverse real clinical scenarios.
Similar Papers
Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization
Image and Video Processing
Helps doctors find cancer better using pictures and genes.
Multi-Level CLS Token Fusion for Contrastive Learning in Endoscopy Image Classification
CV and Pattern Recognition
Helps doctors understand ear, nose, and throat pictures.
Adversarial Multi-Task Learning for Liver Tumor Segmentation, Dynamic Enhancement Regression, and Classification
Image and Video Processing
Helps doctors find and understand liver tumors faster.