Score: 0

GeoDiff3D: Self-Supervised 3D Scene Generation with Geometry-Constrained 2D Diffusion Guidance

Published: January 27, 2026 | arXiv ID: 2601.19785v1

By: Haozhi Zhu , Miaomiao Zhao , Dingyao Liu and more

Potential Business Impact:

Creates realistic 3D worlds from simple ideas.

Business Areas:

3D Technology Hardware, Software

3D scene generation is a core technology for gaming, film/VFX, and VR/AR. Growing demand for rapid iteration, high-fidelity detail, and accessible content creation has further increased interest in this area. Existing methods broadly follow two paradigms - indirect 2D-to-3D reconstruction and direct 3D generation - but both are limited by weak structural modeling and heavy reliance on large-scale ground-truth supervision, often producing structural artifacts, geometric inconsistencies, and degraded high-frequency details in complex scenes. We propose GeoDiff3D, an efficient self-supervised framework that uses coarse geometry as a structural anchor and a geometry-constrained 2D diffusion model to provide texture-rich reference images. Importantly, GeoDiff3D does not require strict multi-view consistency of the diffusion-generated references and remains robust to the resulting noisy, inconsistent guidance. We further introduce voxel-aligned 3D feature aggregation and dual self-supervision to maintain scene coherence and fine details while substantially reducing dependence on labeled data. GeoDiff3D also trains with low computational cost and enables fast, high-quality 3D scene generation. Extensive experiments on challenging scenes show improved generalization and generation quality over existing baselines, offering a practical solution for accessible and efficient 3D scene construction.

GeoDiff3D: Self-Supervised 3D Scene Generation with Geometry-Constrained 2D Diffusion Guidance

CV and Pattern Recognition

Creates realistic 3D worlds from simple pictures.

27 Jan 2026 0

91%

GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis

CV and Pattern Recognition

Creates realistic 3D rooms from your words.

18 Nov 2025 0

91%

ScenDi: 3D-to-2D Scene Diffusion Cascades for Urban Generation

CV and Pattern Recognition

Creates realistic city scenes from simple instructions.

21 Jan 2026 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

GeoDiff3D: Self-Supervised 3D Scene Generation with Geometry-Constrained 2D Diffusion Guidance

Creates realistic 3D worlds from simple ideas.

Technical Abstract

GeoDiff3D: Self-Supervised 3D Scene Generation with Geometry-Constrained 2D Diffusion Guidance

GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis

ScenDi: 3D-to-2D Scene Diffusion Cascades for Urban Generation