Prompt-based Multimodal Semantic Communication for Multi-spectral Image Segmentation
By: Haoshuo Zhang , Yufei Bo , Hongwei Zhang and more
Potential Business Impact:
Boosts scene splitting from multi-light images for safe driving
Multimodal semantic communication has gained widespread attention due to its ability to enhance downstream task performance. A key challenge in such systems is the effective fusion of features from different modalities, which requires the extraction of rich and diverse semantic representations from each modality. To this end, we propose ProMSC-MIS, a Prompt-based Multimodal Semantic Communication system for Multi-spectral Image Segmentation. Specifically, we propose a pre-training algorithm where features from one modality serve as prompts for another, guiding unimodal semantic encoders to learn diverse and complementary semantic representations. We further introduce a semantic fusion module that combines cross-attention mechanisms and squeeze-and-excitation (SE) networks to effectively fuse cross-modal features. Simulation results show that ProMSC-MIS significantly outperforms benchmark methods across various channel-source compression levels, while maintaining low computational complexity and storage overhead. Our scheme has great potential for applications such as autonomous driving and nighttime surveillance.
Similar Papers
ProMSC-MIS: Prompt-based Multimodal Semantic Communication for Multi-Spectral Image Segmentation
Multimedia
Lets cameras see better with less data.
Multi-Modal Semantic Communication
Machine Learning (CS)
Lets computers understand pictures from your words.
Perception-Enhanced Multitask Multimodal Semantic Communication for UAV-Assisted Integrated Sensing and Communication System
Information Theory
Drones share pictures and maps better for emergencies.