Score: 0

Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Remote Sensing

Published: December 22, 2025 | arXiv ID: 2512.19302v1

By: Xu Zhang , Junyao Ge , Yang Zheng and more

Large Vision-Language Models (LVLMs) hold great promise for advancing remote sensing (RS) analysis, yet existing reasoning segmentation frameworks couple linguistic reasoning and pixel prediction through end-to-end supervised fine-tuning, leading to weak geometric grounding and limited generalization across tasks. To address this, we developed Think2Seg-RS, a decoupled framework that trains an LVLM prompter to control a frozen Segment Anything Model (SAM) via structured geometric prompts. Through a mask-only reinforcement learning objective, the LVLM learns to translate abstract semantic reasoning into spatially grounded actions, achieving state-of-the-art performance on the EarthReason dataset. Remarkably, the learned prompting policy generalizes zero-shot to multiple referring segmentation benchmarks, exposing a distinct divide between semantic-level and instance-level grounding. We further found that compact segmenters outperform larger ones under semantic-level supervision, and that negative prompts are ineffective in heterogeneous aerial backgrounds. Together, these findings establish semantic-level reasoning segmentation as a new paradigm for geospatial understanding, opening the way toward unified, interpretable LVLM-driven Earth observation. Our code and model are available at https://github.com/Ricardo-XZ/Think2Seg-RS.

SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model

CV and Pattern Recognition

Lets computers understand maps from descriptions.

13 Apr 2025 2

90%

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion

CV and Pattern Recognition

Helps computers understand 3D shapes and where things are.

21 Nov 2025 1

90%

Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models

CV and Pattern Recognition

Helps computers find places from any picture.

17 Jun 2025 1

View PDF Login to Bookmark

Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Remote Sensing

Technical Abstract

SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model

SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion

Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models