CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation
By: Yuhang Ming , Chenxin Fang , Xingyuan Yu and more
Potential Business Impact:
Makes 3D worlds look real with less computer power.
Recent advances in Gaussian Splatting based 3D scene representation have shown two major trends: semantics-oriented approaches that focus on high-level understanding but lack explicit 3D geometry modeling, and structure-oriented approaches that capture spatial structures yet provide limited semantic abstraction. To bridge this gap, we present CUS-GS, a compact unified structured Gaussian Splatting representation, which connects multimodal semantic features with structured 3D geometry. Specifically, we design a voxelized anchor structure that constructs a spatial scaffold, while extracting multimodal semantic features from a set of foundation models (e.g., CLIP, DINOv2, SEEM). Moreover, we introduce a multimodal latent feature allocation mechanism to unify appearance, geometry, and semantics across heterogeneous feature spaces, ensuring a consistent representation across multiple foundation models. Finally, we propose a feature-aware significance evaluation strategy to dynamically guide anchor growing and pruning, effectively removing redundant or invalid anchors while maintaining semantic integrity. Extensive experiments show that CUS-GS achieves competitive performance compared to state-of-the-art methods using as few as 6M parameters - an order of magnitude smaller than the closest rival at 35M - highlighting the excellent trade off between performance and model efficiency of the proposed framework.
Similar Papers
UniGS: Unified Geometry-Aware Gaussian Splatting for Multimodal Rendering
CV and Pattern Recognition
Creates realistic 3D worlds from many pictures.
Smol-GS: Compact Representations for Abstract 3D Gaussian Splatting
CV and Pattern Recognition
Shrinks 3D scenes to tiny sizes, keeping detail.
AG$^2$aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing
CV and Pattern Recognition
Organizes 3D scenes for precise object editing