Score: 1

TSE-Net: Semi-supervised Monocular Height Estimation from Single Remote Sensing Images

Published: November 17, 2025 | arXiv ID: 2511.13552v1

By: Sining Chen, Xiao Xiang Zhu

Potential Business Impact:

Lets computers guess heights from single pictures.

Business Areas:

Image Recognition Data and Analytics, Software

Monocular height estimation plays a critical role in 3D perception for remote sensing, offering a cost-effective alternative to multi-view or LiDAR-based methods. While deep learning has significantly advanced the capabilities of monocular height estimation, these methods remain fundamentally limited by the availability of labeled data, which are expensive and labor-intensive to obtain at scale. The scarcity of high-quality annotations hinders the generalization and performance of existing models. To overcome this limitation, we propose leveraging large volumes of unlabeled data through a semi-supervised learning framework, enabling the model to extract informative cues from unlabeled samples and improve its predictive performance. In this work, we introduce TSE-Net, a self-training pipeline for semi-supervised monocular height estimation. The pipeline integrates teacher, student, and exam networks. The student network is trained on unlabeled data using pseudo-labels generated by the teacher network, while the exam network functions as a temporal ensemble of the student network to stabilize performance. The teacher network is formulated as a joint regression and classification model: the regression branch predicts height values that serve as pseudo-labels, and the classification branch predicts height value classes along with class probabilities, which are used to filter pseudo-labels. Height value classes are defined using a hierarchical bi-cut strategy to address the inherent long-tailed distribution of heights, and the predicted class probabilities are calibrated with a Plackett-Luce model to reflect the expected accuracy of pseudo-labels. We evaluate the proposed pipeline on three datasets spanning different resolutions and imaging modalities. Codes are available at https://github.com/zhu-xlab/tse-net.

Enhancing Monocular Height Estimation via Weak Supervision from Imperfect Labels

CV and Pattern Recognition

Helps computers guess how tall things are from pictures.

3 Jun 2025 1

88%

Enhancing Monocular Height Estimation via Sparse LiDAR-Guided Correction

CV and Pattern Recognition

Makes 3D maps more accurate using shadows and real data.

11 May 2025 2

88%

RTS-Mono: A Real-Time Self-Supervised Monocular Depth Estimation Method for Real-World Deployment

CV and Pattern Recognition

Helps cars see how far things are, fast.

18 Nov 2025 2

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

34 pages

TSE-Net: Semi-supervised Monocular Height Estimation from Single Remote Sensing Images

Lets computers guess heights from single pictures.

Technical Abstract

Enhancing Monocular Height Estimation via Weak Supervision from Imperfect Labels

Enhancing Monocular Height Estimation via Sparse LiDAR-Guided Correction

RTS-Mono: A Real-Time Self-Supervised Monocular Depth Estimation Method for Real-World Deployment