Score: 1

RS2-SAM2: Customized SAM2 for Referring Remote Sensing Image Segmentation

Published: March 10, 2025 | arXiv ID: 2503.07266v3

By: Fu Rong , Meng Lan , Qian Zhang and more

Potential Business Impact:

Helps computers find things in satellite pictures using words.

Business Areas:
Semantic Search Internet Services

Referring Remote Sensing Image Segmentation (RRSIS) aims to segment target objects in remote sensing (RS) images based on textual descriptions. Although Segment Anything Model 2 (SAM2) has shown remarkable performance in various segmentation tasks, its application to RRSIS presents several challenges, including understanding the text-described RS scenes and generating effective prompts from text descriptions. To address these issues, we propose RS2-SAM2, a novel framework that adapts SAM2 to RRSIS by aligning the adapted RS features and textual features, providing pseudo-mask-based dense prompts, and enforcing boundary constraints. Specifically, we employ a union encoder to jointly encode the visual and textual inputs, generating aligned visual and text embeddings as well as multimodal class tokens. A bidirectional hierarchical fusion module is introduced to adapt SAM2 to RS scenes and align adapted visual features with the visually enhanced text embeddings, improving the model's interpretation of text-described RS scenes. To provide precise target cues for SAM2, we design a mask prompt generator, which takes the visual embeddings and class tokens as input and produces a pseudo-mask as the dense prompt of SAM2. Experimental results on several RRSIS benchmarks demonstrate that RS2-SAM2 achieves state-of-the-art performance.

Country of Origin
🇨🇳 🇭🇰 China, Hong Kong

Page Count
9 pages

Category
Computer Science:
CV and Pattern Recognition