Photorealistic Phantom Roads in Real Scenes: Disentangling 3D Hallucinations from Physical Geometry
By: Hoang Nguyen, Xiaohao Xu, Xiaonan Huang
Potential Business Impact:
Fixes computer vision's fake 3D illusions.
Monocular depth foundation models achieve remarkable generalization by learning large-scale semantic priors, but this creates a critical vulnerability: they hallucinate illusory 3D structures from geometrically planar but perceptually ambiguous inputs. We term this failure the 3D Mirage. This paper introduces the first end-to-end framework to probe, quantify, and tame this unquantified safety risk. To probe, we present 3D-Mirage, the first benchmark of real-world illusions (e.g., street art) with precise planar-region annotations and context-restricted crops. To quantify, we propose a Laplacian-based evaluation framework with two metrics: the Deviation Composite Score (DCS) for spurious non-planarity and the Confusion Composite Score (CCS) for contextual instability. To tame this failure, we introduce Grounded Self-Distillation, a parameter-efficient strategy that surgically enforces planarity on illusion ROIs while using a frozen teacher to preserve background knowledge, thus avoiding catastrophic forgetting. Our work provides the essential tools to diagnose and mitigate this phenomenon, urging a necessary shift in MDE evaluation from pixel-wise accuracy to structural and contextual robustness. Our code and benchmark will be publicly available to foster this exciting research direction.
Similar Papers
Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision
CV and Pattern Recognition
Makes 3D pictures more real, near and far.
LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding
CV and Pattern Recognition
Creates 3D driving worlds for robots to learn.
CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing
CV and Pattern Recognition
Finds fake details in computer-made pictures.