Score: 1

GVSynergy-Det: Synergistic Gaussian-Voxel Representations for Multi-View 3D Object Detection

Published: December 29, 2025 | arXiv ID: 2512.23176v1

By: Yi Zhang , Yi Wang , Lei Yao and more

Potential Business Impact:

Finds objects in 3D using only pictures.

Business Areas:

Image Recognition Data and Analytics, Software

Image-based 3D object detection aims to identify and localize objects in 3D space using only RGB images, eliminating the need for expensive depth sensors required by point cloud-based methods. Existing image-based approaches face two critical challenges: methods achieving high accuracy typically require dense 3D supervision, while those operating without such supervision struggle to extract accurate geometry from images alone. In this paper, we present GVSynergy-Det, a novel framework that enhances 3D detection through synergistic Gaussian-Voxel representation learning. Our key insight is that continuous Gaussian and discrete voxel representations capture complementary geometric information: Gaussians excel at modeling fine-grained surface details while voxels provide structured spatial context. We introduce a dual-representation architecture that: 1) adapts generalizable Gaussian Splatting to extract complementary geometric features for detection tasks, and 2) develops a cross-representation enhancement mechanism that enriches voxel features with geometric details from Gaussian fields. Unlike previous methods that either rely on time-consuming per-scene optimization or utilize Gaussian representations solely for depth regularization, our synergistic strategy directly leverages features from both representations through learnable integration, enabling more accurate object localization. Extensive experiments demonstrate that GVSynergy-Det achieves state-of-the-art results on challenging indoor benchmarks, significantly outperforming existing methods on both ScanNetV2 and ARKitScenes datasets, all without requiring any depth or dense 3D geometry supervision (e.g., point clouds or TSDF).

Visibility-Aware Densification for 3D Gaussian Splatting in Dynamic Urban Scenes

CV and Pattern Recognition

Makes 3D pictures look real in messy places.

10 Oct 2025 1

89%

Automated 3D-GS Registration and Fusion via Skeleton Alignment and Gaussian-Adaptive Features

CV and Pattern Recognition

Combines 3D scenes perfectly for robots.

28 Jul 2025 0

89%

C3G: Learning Compact 3D Representations with 2K Gaussians

CV and Pattern Recognition

Builds detailed 3D worlds from few pictures.

3 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇭🇰 Hong Kong

Page Count

11 pages

GVSynergy-Det: Synergistic Gaussian-Voxel Representations for Multi-View 3D Object Detection

Finds objects in 3D using only pictures.

Technical Abstract

Visibility-Aware Densification for 3D Gaussian Splatting in Dynamic Urban Scenes

Automated 3D-GS Registration and Fusion via Skeleton Alignment and Gaussian-Adaptive Features

C3G: Learning Compact 3D Representations with 2K Gaussians