Score: 1

MSCloudCAM: Cross-Attention with Multi-Scale Context for Multispectral Cloud Segmentation

Published: October 12, 2025 | arXiv ID: 2510.10802v1

By: Md Abdullah Al Mazid, Liangdong Deng, Naphtali Rishe

Potential Business Impact:

Clears clouds from satellite pictures for better Earth views.

Business Areas:
Image Recognition Data and Analytics, Software

Clouds remain a critical challenge in optical satellite imagery, hindering reliable analysis for environmental monitoring, land cover mapping, and climate research. To overcome this, we propose MSCloudCAM, a Cross-Attention with Multi-Scale Context Network tailored for multispectral and multi-sensor cloud segmentation. Our framework exploits the spectral richness of Sentinel-2 (CloudSEN12) and Landsat-8 (L8Biome) data to classify four semantic categories: clear sky, thin cloud, thick cloud, and cloud shadow. MSCloudCAM combines a Swin Transformer backbone for hierarchical feature extraction with multi-scale context modules ASPP and PSP for enhanced scale-aware learning. A Cross-Attention block enables effective multisensor and multispectral feature fusion, while the integration of an Efficient Channel Attention Block (ECAB) and a Spatial Attention Module adaptively refine feature representations. Comprehensive experiments on CloudSEN12 and L8Biome demonstrate that MSCloudCAM delivers state-of-the-art segmentation accuracy, surpassing leading baseline architectures while maintaining competitive parameter efficiency and FLOPs. These results underscore the model's effectiveness and practicality, making it well-suited for large-scale Earth observation tasks and real-world applications.

Page Count
7 pages

Category
Computer Science:
CV and Pattern Recognition