Score: 2

CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios

Published: August 13, 2025 | arXiv ID: 2508.09470v1

By: Jialei Xu , Zizhuang Wei , Weikang You and more

BigTech Affiliations: Huawei

Potential Business Impact:

Lets drones understand cities without seeing.

Semantic segmentation of city-scale point clouds is a critical technology for Unmanned Aerial Vehicle (UAV) perception systems, enabling the classification of 3D points without relying on any visual information to achieve comprehensive 3D understanding. However, existing models are frequently constrained by the limited scale of 3D data and the domain gap between datasets, which lead to reduced generalization capability. To address these challenges, we propose CitySeg, a foundation model for city-scale point cloud semantic segmentation that incorporates text modality to achieve open vocabulary segmentation and zero-shot inference. Specifically, in order to mitigate the issue of non-uniform data distribution across multiple domains, we customize the data preprocessing rules, and propose a local-global cross-attention network to enhance the perception capabilities of point networks in UAV scenarios. To resolve semantic label discrepancies across datasets, we introduce a hierarchical classification strategy. A hierarchical graph established according to the data annotation rules consolidates the data labels, and the graph encoder is used to model the hierarchical relationships between categories. In addition, we propose a two-stage training strategy and employ hinge loss to increase the feature separability of subcategories. Experimental results demonstrate that the proposed CitySeg achieves state-of-the-art (SOTA) performance on nine closed-set benchmarks, significantly outperforming existing approaches. Moreover, for the first time, CitySeg enables zero-shot generalization in city-scale point cloud scenarios without relying on visual information.

OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds

CV and Pattern Recognition

Lets computers understand city buildings from 3D scans.

13 Sep 2025 0

89%

UniGeoSeg: Towards Unified Open-World Segmentation for Geospatial Scenes

CV and Pattern Recognition

Teaches computers to understand satellite images better.

28 Nov 2025 1

88%

HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering

CV and Pattern Recognition

Maps cities in 3D without human help.

18 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

16 pages

CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios

Lets drones understand cities without seeing.

Technical Abstract

OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds

UniGeoSeg: Towards Unified Open-World Segmentation for Geospatial Scenes

HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering