Score: 1

MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models

Published: March 17, 2025 | arXiv ID: 2503.13743v1

By: Johannes Meier , Louis Inchingolo , Oussema Dhaouadi and more

Potential Business Impact:

Helps cars see in 3D without extra cameras.

Business Areas:
Image Recognition Data and Analytics, Software

We tackle the problem of monocular 3D object detection across different sensors, environments, and camera setups. In this paper, we introduce a novel unsupervised domain adaptation approach, MonoCT, that generates highly accurate pseudo labels for self-supervision. Inspired by our observation that accurate depth estimation is critical to mitigating domain shifts, MonoCT introduces a novel Generalized Depth Enhancement (GDE) module with an ensemble concept to improve depth estimation accuracy. Moreover, we introduce a novel Pseudo Label Scoring (PLS) module by exploring inner-model consistency measurement and a Diversity Maximization (DM) strategy to further generate high-quality pseudo labels for self-training. Extensive experiments on six benchmarks show that MonoCT outperforms existing SOTA domain adaptation methods by large margins (~21% minimum for AP Mod.) and generalizes well to car, traffic camera and drone views.

Country of Origin
🇩🇪 Germany

Page Count
8 pages

Category
Computer Science:
CV and Pattern Recognition