Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting
By: Ananta R. Bhattarai, Helge Rhodin
Potential Business Impact:
Makes AI better at guessing distances in photos.
Monocular depth estimation remains challenging as recent foundation models, such as Depth Anything V2 (DA-V2), struggle with real-world images that are far from the training distribution. We introduce Re-Depth Anything, a test-time self-supervision framework that bridges this domain gap by fusing DA-V2 with the powerful priors of large-scale 2D diffusion models. Our method performs label-free refinement directly on the input image by re-lighting predicted depth maps and augmenting the input. This re-synthesis method replaces classical photometric reconstruction by leveraging shape from shading (SfS) cues in a new, generative context with Score Distillation Sampling (SDS). To prevent optimization collapse, our framework employs a targeted optimization strategy: rather than optimizing depth directly or fine-tuning the full model, we freeze the encoder and only update intermediate embeddings while also fine-tuning the decoder. Across diverse benchmarks, Re-Depth Anything yields substantial gains in depth accuracy and realism over the DA-V2, showcasing new avenues for self-supervision by augmenting geometric reasoning.
Similar Papers
Depth Anything 3: Recovering the Visual Space from Any Views
CV and Pattern Recognition
Lets computers see 3D shapes from pictures.
Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation
CV and Pattern Recognition
Helps cameras see depth in fast, dim light.
Depth Anything with Any Prior
CV and Pattern Recognition
Makes any picture show how far away things are.