A Text-Image Fusion Method with Data Augmentation Capabilities for Referring Medical Image Segmentation
By: Shurong Chai , Rahul Kumar JAIN , Rui Xu and more
Potential Business Impact:
Helps doctors find sickness in scans better.
Deep learning relies heavily on data augmentation to mitigate limited data, especially in medical imaging. Recent multimodal learning integrates text and images for segmentation, known as referring or text-guided image segmentation. However, common augmentations like rotation and flipping disrupt spatial alignment between image and text, weakening performance. To address this, we propose an early fusion framework that combines text and visual features before augmentation, preserving spatial consistency. We also design a lightweight generator that projects text embeddings into visual space, bridging semantic gaps. Visualization of generated pseudo-images shows accurate region localization. Our method is evaluated on three medical imaging tasks and four segmentation frameworks, achieving state-of-the-art results. Code is publicly available on GitHub: https://github.com/11yxk/MedSeg_EarlyFusion.
Similar Papers
Diffusion-Based Data Augmentation for Medical Image Segmentation
CV and Pattern Recognition
Creates fake medical images to train doctors better.
SMFusion: Semantic-Preserving Fusion of Multimodal Medical Images for Enhanced Clinical Diagnosis
CV and Pattern Recognition
Helps doctors see more in medical pictures.
RIS-FUSION: Rethinking Text-Driven Infrared and Visible Image Fusion from the Perspective of Referring Image Segmentation
CV and Pattern Recognition
Makes pictures match words better.