Score: 0

Prompt-based Multimodal Semantic Communication for Multi-spectral Image Segmentation

Published: August 25, 2025 | arXiv ID: 2508.17920v1

By: Haoshuo Zhang , Yufei Bo , Hongwei Zhang and more

Potential Business Impact:

Boosts scene splitting from multi-light images for safe driving

Business Areas:
Semantic Search Internet Services

Multimodal semantic communication has gained widespread attention due to its ability to enhance downstream task performance. A key challenge in such systems is the effective fusion of features from different modalities, which requires the extraction of rich and diverse semantic representations from each modality. To this end, we propose ProMSC-MIS, a Prompt-based Multimodal Semantic Communication system for Multi-spectral Image Segmentation. Specifically, we propose a pre-training algorithm where features from one modality serve as prompts for another, guiding unimodal semantic encoders to learn diverse and complementary semantic representations. We further introduce a semantic fusion module that combines cross-attention mechanisms and squeeze-and-excitation (SE) networks to effectively fuse cross-modal features. Simulation results show that ProMSC-MIS significantly outperforms benchmark methods across various channel-source compression levels, while maintaining low computational complexity and storage overhead. Our scheme has great potential for applications such as autonomous driving and nighttime surveillance.

Country of Origin
🇨🇳 China

Page Count
6 pages

Category
Electrical Engineering and Systems Science:
Image and Video Processing