Prompt-based Multimodal Semantic Communication for Multi-spectral Image Segmentation
By: Haoshuo Zhang , Yufei Bo , Hongwei Zhang and more
Potential Business Impact:
Boosts scene splitting from multi-light images for safe driving
Multimodal semantic communication has gained widespread attention due to its ability to enhance downstream task performance. A key challenge in such systems is the effective fusion of features from different modalities, which requires the extraction of rich and diverse semantic representations from each modality. To this end, we propose ProMSC-MIS, a Prompt-based Multimodal Semantic Communication system for Multi-spectral Image Segmentation. Specifically, we propose a pre-training algorithm where features from one modality serve as prompts for another, guiding unimodal semantic encoders to learn diverse and complementary semantic representations. We further introduce a semantic fusion module that combines cross-attention mechanisms and squeeze-and-excitation (SE) networks to effectively fuse cross-modal features. Simulation results show that ProMSC-MIS significantly outperforms benchmark methods across various channel-source compression levels, while maintaining low computational complexity and storage overhead. Our scheme has great potential for applications such as autonomous driving and nighttime surveillance.
Similar Papers
ProMSC-MIS: Prompt-based Multimodal Semantic Communication for Multi-Spectral Image Segmentation
Multimedia
Lets cameras see better with less data.
MSCRS: Multi-modal Semantic Graph Prompt Learning Framework for Conversational Recommender Systems
Information Retrieval
Helps computers suggest movies better by looking at pictures.
MSM-Seg: A Modality-and-Slice Memory Framework with Category-Agnostic Prompting for Multi-Modal Brain Tumor Segmentation
CV and Pattern Recognition
Helps doctors find brain tumors faster and better.