Score: 0

TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting

Published: April 13, 2025 | arXiv ID: 2504.09588v2

By: Zhicong Wu , Hongbin Xu , Gang Xu and more

Potential Business Impact:

Makes 3D pictures from text descriptions.

Business Areas:

Text Analytics Data and Analytics, Software

Recent advancements in Generalizable Gaussian Splatting have enabled robust 3D reconstruction from sparse input views by utilizing feed-forward Gaussian Splatting models, achieving superior cross-scene generalization. However, while many methods focus on geometric consistency, they often neglect the potential of text-driven guidance to enhance semantic understanding, which is crucial for accurately reconstructing fine-grained details in complex scenes. To address this limitation, we propose TextSplat--the first text-driven Generalizable Gaussian Splatting framework. By employing a text-guided fusion of diverse semantic cues, our framework learns robust cross-modal feature representations that improve the alignment of geometric and semantic information, producing high-fidelity 3D reconstructions. Specifically, our framework employs three parallel modules to obtain complementary representations: the Diffusion Prior Depth Estimator for accurate depth information, the Semantic Aware Segmentation Network for detailed semantic information, and the Multi-View Interaction Network for refined cross-view features. Then, in the Text-Guided Semantic Fusion Module, these representations are integrated via the text-guided and attention-based feature aggregation mechanism, resulting in enhanced 3D Gaussian parameters enriched with detailed semantic cues. Experimental results on various benchmark datasets demonstrate improved performance compared to existing methods across multiple evaluation metrics, validating the effectiveness of our framework. The code will be publicly available.

SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields

CV and Pattern Recognition

Builds 3D worlds from a few pictures.

11 Jun 2025 0

91%

GSsplat: Generalizable Semantic Gaussian Splatting for Novel-view Synthesis in 3D Scenes

Graphics

Makes 3D scenes understandable from many angles.

7 May 2025 2

90%

SplatTalk: 3D VQA with Gaussian Splatting

CV and Pattern Recognition

Lets computers understand 3D worlds from pictures.

8 Mar 2025 1

View PDF Login to Bookmark

Page Count

10 pages

TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting

Makes 3D pictures from text descriptions.

Technical Abstract

SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields

GSsplat: Generalizable Semantic Gaussian Splatting for Novel-view Synthesis in 3D Scenes

SplatTalk: 3D VQA with Gaussian Splatting