Structural Energy-Guided Sampling for View-Consistent Text-to-3D
By: Qing Zhang , Jinguang Tong , Jie Hong and more
Potential Business Impact:
Fixes 3D pictures so they look right from all sides.
Text-to-3D generation often suffers from the Janus problem, where objects look correct from the front but collapse into duplicated or distorted geometry from other angles. We attribute this failure to viewpoint bias in 2D diffusion priors, which propagates into 3D optimization. To address this, we propose Structural Energy-Guided Sampling (SEGS), a training-free, plug-and-play framework that enforces multi-view consistency entirely at sampling time. SEGS defines a structural energy in a PCA subspace of intermediate U-Net features and injects its gradients into the denoising trajectory, steering geometry toward the intended viewpoint while preserving appearance fidelity. Integrated seamlessly into SDS/VSD pipelines, SEGS significantly reduces Janus artifacts, achieving improved geometric alignment and viewpoint consistency without retraining or weight modification.
Similar Papers
SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding
CV and Pattern Recognition
Creates realistic 3D worlds from videos.
Consistent View Alignment Improves Foundation Models for 3D Medical Image Segmentation
CV and Pattern Recognition
Teaches computers to learn better from different pictures.
Photo3D: Advancing Photorealistic 3D Generation through Structure-Aligned Detail Enhancement
CV and Pattern Recognition
Makes computer-made 3D objects look real.