Cue3D: Quantifying the Role of Image Cues in Single-Image 3D Generation
By: Xiang Li , Zirui Wang , Zixuan Huang and more
Potential Business Impact:
Shows computers how to build 3D shapes from pictures.
Humans and traditional computer vision methods rely on a diverse set of monocular cues to infer 3D structure from a single image, such as shading, texture, silhouette, etc. While recent deep generative models have dramatically advanced single-image 3D generation, it remains unclear which image cues these methods actually exploit. We introduce Cue3D, the first comprehensive, model-agnostic framework for quantifying the influence of individual image cues in single-image 3D generation. Our unified benchmark evaluates seven state-of-the-art methods, spanning regression-based, multi-view, and native 3D generative paradigms. By systematically perturbing cues such as shading, texture, silhouette, perspective, edges, and local continuity, we measure their impact on 3D output quality. Our analysis reveals that shape meaningfulness, not texture, dictates generalization. Geometric cues, particularly shading, are crucial for 3D generation. We further identify over-reliance on provided silhouettes and diverse sensitivities to cues such as perspective and local continuity across model families. By dissecting these dependencies, Cue3D advances our understanding of how modern 3D networks leverage classical vision cues, and offers directions for developing more transparent, robust, and controllable single-image 3D generation models.
Similar Papers
GEN3D: Generating Domain-Free 3D Scenes from a Single Image
CV and Pattern Recognition
Creates realistic 3D worlds from one picture.
Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction
CV and Pattern Recognition
Makes 3D pictures from photos better.
Photo3D: Advancing Photorealistic 3D Generation through Structure-Aligned Detail Enhancement
CV and Pattern Recognition
Makes computer-made 3D objects look real.