TSE-Net: Semi-supervised Monocular Height Estimation from Single Remote Sensing Images
By: Sining Chen, Xiao Xiang Zhu
Potential Business Impact:
Lets computers guess heights from single pictures.
Monocular height estimation plays a critical role in 3D perception for remote sensing, offering a cost-effective alternative to multi-view or LiDAR-based methods. While deep learning has significantly advanced the capabilities of monocular height estimation, these methods remain fundamentally limited by the availability of labeled data, which are expensive and labor-intensive to obtain at scale. The scarcity of high-quality annotations hinders the generalization and performance of existing models. To overcome this limitation, we propose leveraging large volumes of unlabeled data through a semi-supervised learning framework, enabling the model to extract informative cues from unlabeled samples and improve its predictive performance. In this work, we introduce TSE-Net, a self-training pipeline for semi-supervised monocular height estimation. The pipeline integrates teacher, student, and exam networks. The student network is trained on unlabeled data using pseudo-labels generated by the teacher network, while the exam network functions as a temporal ensemble of the student network to stabilize performance. The teacher network is formulated as a joint regression and classification model: the regression branch predicts height values that serve as pseudo-labels, and the classification branch predicts height value classes along with class probabilities, which are used to filter pseudo-labels. Height value classes are defined using a hierarchical bi-cut strategy to address the inherent long-tailed distribution of heights, and the predicted class probabilities are calibrated with a Plackett-Luce model to reflect the expected accuracy of pseudo-labels. We evaluate the proposed pipeline on three datasets spanning different resolutions and imaging modalities. Codes are available at https://github.com/zhu-xlab/tse-net.
Similar Papers
Enhancing Monocular Height Estimation via Weak Supervision from Imperfect Labels
CV and Pattern Recognition
Helps computers guess how tall things are from pictures.
Enhancing Monocular Height Estimation via Sparse LiDAR-Guided Correction
CV and Pattern Recognition
Makes 3D maps more accurate using shadows and real data.
RTS-Mono: A Real-Time Self-Supervised Monocular Depth Estimation Method for Real-World Deployment
CV and Pattern Recognition
Helps cars see how far things are, fast.