ReferSplat: Referring Segmentation in 3D Gaussian Splatting
By: Shuting He , Guangquan Jie , Changshuo Wang and more
Potential Business Impact:
Lets robots find objects using spoken words.
We introduce Referring 3D Gaussian Splatting Segmentation (R3DGS), a new task that aims to segment target objects in a 3D Gaussian scene based on natural language descriptions, which often contain spatial relationships or object attributes. This task requires the model to identify newly described objects that may be occluded or not directly visible in a novel view, posing a significant challenge for 3D multi-modal understanding. Developing this capability is crucial for advancing embodied AI. To support research in this area, we construct the first R3DGS dataset, Ref-LERF. Our analysis reveals that 3D multi-modal understanding and spatial relationship modeling are key challenges for R3DGS. To address these challenges, we propose ReferSplat, a framework that explicitly models 3D Gaussian points with natural language expressions in a spatially aware paradigm. ReferSplat achieves state-of-the-art performance on both the newly proposed R3DGS task and 3D open-vocabulary segmentation benchmarks. Dataset and code are available at https://github.com/heshuting555/ReferSplat.
Similar Papers
LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation
CV and Pattern Recognition
Lets computers understand and label objects in 3D.
OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting
CV and Pattern Recognition
Lets computers find any object in 3D scenes.
Efficient Label Refinement for Face Parsing Under Extreme Poses Using 3D Gaussian Splatting
CV and Pattern Recognition
Makes computers understand faces from any angle.