Score: 1

SAR-TEXT: A Large-Scale SAR Image-Text Dataset Built with SAR-Narrator and Progressive Transfer Learning

Published: July 24, 2025 | arXiv ID: 2507.18743v2

By: Yiguo He , Xinjun Cheng , Junjie Zhu and more

Potential Business Impact:

Helps computers understand satellite pictures better.

Business Areas:

Text Analytics Data and Analytics, Software

Vision Language Models (VLMs) have achieved remarkable breakthroughs in the field of remote sensing in recent years. Synthetic Aperture Radar (SAR) imagery, with its all-weather capability, is essential in remote sensing, yet the lack of large-scale, high-quality SAR image-text datasets hinders its semantic understanding. In this paper, we construct SAR-TEXT, a large-scale and high-quality dataset consisting of over 130,000 SAR image-text pairs. To construct the SAR-TEXT dataset, we design the SAR-Narrator framework, which generates textual descriptions for SAR images through a multi-stage strategy. To verify the effectiveness of the SAR-TEXT dataset, we conduct experiments on three typical vision-language tasks: image-text retrieval, image captioning, and visual question answering (VQA). Specifically, we construct three representative models on SAR-TEXT: SAR-RS-CLIP, SAR-RS-CoCa, and SAR-GPT. SAR-RS-CLIP achieves notable improvements in retrieval performance, boosting average recall by 12.97% and 10.0% on the OSdataset_512 and HRSID test sets, respectively. In the captioning task, SAR-RS-CoCa achieves significant improvements over the original CoCa models in terms of BLEU-4, SPICE, and CIDEr scores. In the VQA task, SAR-GPT outperforms baseline and single-stage models on multiple SAR-VQA datasets, demonstrating stronger semantic understanding and reasoning ability, as further confirmed by qualitative results. It is worth noting that, as a flexible captioning tool, SAR-Narrator can be readily adopted by the community to construct larger-scale SAR image-text datasets. All code, pretrained models, and the SAR-Text dataset are publicly available at: https://github.com/YiguoHe/SAR-TEXT.

SAR-TEXT: A Large-Scale SAR Image-Text Dataset Built with SAR-Narrator and Progressive Transfer Learning

CV and Pattern Recognition

Helps computers understand satellite pictures better.

24 Jul 2025 0

91%

SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

Computation and Language

Teaches computers to understand satellite pictures.

12 Feb 2025 1

90%

SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding

CV and Pattern Recognition

Helps computers understand satellite pictures better.

4 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

19 pages

SAR-TEXT: A Large-Scale SAR Image-Text Dataset Built with SAR-Narrator and Progressive Transfer Learning

Helps computers understand satellite pictures better.

Technical Abstract

SAR-TEXT: A Large-Scale SAR Image-Text Dataset Built with SAR-Narrator and Progressive Transfer Learning

SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding