Score: 1

Online Video Depth Anything: Temporally-Consistent Depth Prediction with Low Memory Consumption

Published: October 10, 2025 | arXiv ID: 2510.09182v1

By: Johann-Friedrich Feiden , Tim Küchler , Denis Zavadski and more

Potential Business Impact:

Lets cameras understand 3D depth in real-time.

Business Areas:

Image Recognition Data and Analytics, Software

Depth estimation from monocular video has become a key component of many real-world computer vision systems. Recently, Video Depth Anything (VDA) has demonstrated strong performance on long video sequences. However, it relies on batch-processing which prohibits its use in an online setting. In this work, we overcome this limitation and introduce online VDA (oVDA). The key innovation is to employ techniques from Large Language Models (LLMs), namely, caching latent features during inference and masking frames at training. Our oVDA method outperforms all competing online video depth estimation methods in both accuracy and VRAM usage. Low VRAM usage is particularly important for deployment on edge devices. We demonstrate that oVDA runs at 42 FPS on an NVIDIA A100 and at 20 FPS on an NVIDIA Jetson edge device. We will release both, code and compilation scripts, making oVDA easy to deploy on low-power hardware.

Depth Anything 3: Recovering the Visual Space from Any Views

CV and Pattern Recognition

Lets computers see 3D shapes from pictures.

13 Nov 2025 1

88%

Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation

CV and Pattern Recognition

Helps cameras see depth in fast, dim light.

18 Sep 2025 1

88%

SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection

CV and Pattern Recognition

Helps computers find moving objects in videos.

13 Nov 2025 2

View PDF Login to Bookmark

Page Count

18 pages

Online Video Depth Anything: Temporally-Consistent Depth Prediction with Low Memory Consumption

Lets cameras understand 3D depth in real-time.

Technical Abstract

Depth Anything 3: Recovering the Visual Space from Any Views

Depth AnyEvent: A Cross-Modal Distillation Paradigm for Event-Based Monocular Depth Estimation

SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection