Score: 2

RTS-Mono: A Real-Time Self-Supervised Monocular Depth Estimation Method for Real-World Deployment

Published: November 18, 2025 | arXiv ID: 2511.14107v1

By: Zeyu Cheng , Tongfei Liu , Tao Lei and more

Potential Business Impact:

Helps cars see how far things are, fast.

Business Areas:

Autonomous Vehicles Transportation

Depth information is crucial for autonomous driving and intelligent robot navigation. The simplicity and flexibility of self-supervised monocular depth estimation are conducive to its role in these fields. However, most existing monocular depth estimation models consume many computing resources. Although some methods have reduced the model's size and improved computing efficiency, the performance deteriorates, seriously hindering the real-world deployment of self-supervised monocular depth estimation models in the real world. To address this problem, we proposed a real-time self-supervised monocular depth estimation method and implemented it in the real world. It is called RTS-Mono, which is a lightweight and efficient encoder-decoder architecture. The encoder is based on Lite-Encoder, and the decoder is designed with a multi-scale sparse fusion framework to minimize redundancy, ensure performance, and improve inference speed. RTS-Mono achieved state-of-the-art (SoTA) performance in high and low resolutions with extremely low parameter counts (3 M) in experiments based on the KITTI dataset. Compared with lightweight methods, RTS-Mono improved Abs Rel and Sq Rel by 5.6% and 9.8% at low resolution and improved Sq Rel and RMSE by 6.1% and 1.9% at high resolution. In real-world deployment experiments, RTS-Mono has extremely high accuracy and can perform real-time inference on Nvidia Jetson Orin at a speed of 49 FPS. Source code is available at https://github.com/ZYCheng777/RTS-Mono.

Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation

Robotics

Helps drones see how far things are.

9 Sep 2025 2

88%

MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models

CV and Pattern Recognition

Helps cars see in 3D without extra cameras.

17 Mar 2025 1

88%

TSE-Net: Semi-supervised Monocular Height Estimation from Single Remote Sensing Images

CV and Pattern Recognition

Lets computers guess heights from single pictures.

17 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

14 pages

RTS-Mono: A Real-Time Self-Supervised Monocular Depth Estimation Method for Real-World Deployment

Helps cars see how far things are, fast.

Technical Abstract

Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation

MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models

TSE-Net: Semi-supervised Monocular Height Estimation from Single Remote Sensing Images