Score: 0

EndoStreamDepth: Temporally Consistent Monocular Depth Estimation for Endoscopic Video Streams

Published: December 20, 2025 | arXiv ID: 2512.18159v1

By: Hao Li , Daiwei Lu , Jiacheng Wang and more

This work presents EndoStreamDepth, a monocular depth estimation framework for endoscopic video streams. It provides accurate depth maps with sharp anatomical boundaries for each frame, temporally consistent predictions across frames, and real-time throughput. Unlike prior work that uses batched inputs, EndoStreamDepth processes individual frames with a temporal module to propagate inter-frame information. The framework contains three main components: (1) a single-frame depth network with endoscopy-specific transformation to produce accurate depth maps, (2) multi-level Mamba temporal modules that leverage inter-frame information to improve accuracy and stabilize predictions, and (3) a hierarchical design with comprehensive multi-scale supervision, where complementary loss terms jointly improve local boundary sharpness and global geometric consistency. We conduct comprehensive evaluations on two publicly available colonoscopy depth estimation datasets. Compared to state-of-the-art monocular depth estimation methods, EndoStreamDepth substantially improves performance, and it produces depth maps with sharp, anatomically aligned boundaries, which are essential to support downstream tasks such as automation for robotic surgery. The code is publicly available at https://github.com/MedICL-VU/EndoStreamDepth

EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training

CV and Pattern Recognition

Helps tiny cameras see inside bodies better.

19 Jun 2025 2

90%

EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images

CV and Pattern Recognition

Helps doctors see inside bodies better.

25 Aug 2025 1

90%

Unifying Scale-Aware Depth Prediction and Perceptual Priors for Monocular Endoscope Pose Estimation and Tissue Reconstruction

CV and Pattern Recognition

Helps surgeons see inside bodies better.

15 Aug 2025 1

View PDF Login to Bookmark

EndoStreamDepth: Temporally Consistent Monocular Depth Estimation for Endoscopic Video Streams

Technical Abstract

EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training

EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images

Unifying Scale-Aware Depth Prediction and Perceptual Priors for Monocular Endoscope Pose Estimation and Tissue Reconstruction