Score: 0

LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction

Published: October 25, 2025 | arXiv ID: 2510.22141v1

By: Yuhang Gao , Xiang Xiang , Sheng Zhong and more

Potential Business Impact:

Helps computers understand 3D spaces with words.

Business Areas:
Image Recognition Data and Analytics, Software

Vision-Language Models (VLMs) have shown significant progress in open-set challenges. However, the limited availability of 3D datasets hinders their effective application in 3D scene understanding. We propose LOC, a general language-guided framework adaptable to various occupancy networks, supporting both supervised and self-supervised learning paradigms. For self-supervised tasks, we employ a strategy that fuses multi-frame LiDAR points for dynamic/static scenes, using Poisson reconstruction to fill voids, and assigning semantics to voxels via K-Nearest Neighbor (KNN) to obtain comprehensive voxel representations. To mitigate feature over-homogenization caused by direct high-dimensional feature distillation, we introduce Densely Contrastive Learning (DCL). DCL leverages dense voxel semantic information and predefined textual prompts. This efficiently enhances open-set recognition without dense pixel-level supervision, and our framework can also leverage existing ground truth to further improve performance. Our model predicts dense voxel features embedded in the CLIP feature space, integrating textual and image pixel information, and classifies based on text and semantic similarity. Experiments on the nuScenes dataset demonstrate the method's superior performance, achieving high-precision predictions for known classes and distinguishing unknown classes without additional training data.

Country of Origin
🇨🇳 China

Page Count
10 pages

Category
Computer Science:
CV and Pattern Recognition