Score: 1

SplatTalk: 3D VQA with Gaussian Splatting

Published: March 8, 2025 | arXiv ID: 2503.06271v2

By: Anh Thai , Songyou Peng , Kyle Genova and more

Potential Business Impact:

Lets computers understand 3D worlds from pictures.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Language-guided 3D scene understanding is important for advancing applications in robotics, AR/VR, and human-computer interaction, enabling models to comprehend and interact with 3D environments through natural language. While 2D vision-language models (VLMs) have achieved remarkable success in 2D VQA tasks, progress in the 3D domain has been significantly slower due to the complexity of 3D data and the high cost of manual annotations. In this work, we introduce SplatTalk, a novel method that uses a generalizable 3D Gaussian Splatting (3DGS) framework to produce 3D tokens suitable for direct input into a pretrained LLM, enabling effective zero-shot 3D visual question answering (3D VQA) for scenes with only posed images. During experiments on multiple benchmarks, our approach outperforms both 3D models trained specifically for the task and previous 2D-LMM-based models utilizing only images (our setting), while achieving competitive performance with state-of-the-art 3D LMMs that additionally utilize 3D inputs. Project website: https://splat-talk.github.io/

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

CV and Pattern Recognition

Teaches computers to understand 3D spaces from scans.

23 Mar 2025 0

93%

Hi-LSplat: Hierarchical 3D Language Gaussian Splatting

CV and Pattern Recognition

Lets computers understand 3D objects from words.

7 Jun 2025 0

92%

SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting

CV and Pattern Recognition

Teaches computers to understand 3D worlds better.

10 Jun 2025 0

View PDF Login to Bookmark

Page Count

14 pages

SplatTalk: 3D VQA with Gaussian Splatting

Lets computers understand 3D worlds from pictures.

Technical Abstract

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

Hi-LSplat: Hierarchical 3D Language Gaussian Splatting

SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting