Generalizing Monocular 3D Object Detection
By: Abhinav Kumar
Potential Business Impact:
Helps cars see in 3D from one picture.
Monocular 3D object detection (Mono3D) is a fundamental computer vision task that estimates an object's class, 3D position, dimensions, and orientation from a single image. Its applications, including autonomous driving, augmented reality, and robotics, critically rely on accurate 3D environmental understanding. This thesis addresses the challenge of generalizing Mono3D models to diverse scenarios, including occlusions, datasets, object sizes, and camera parameters. To enhance occlusion robustness, we propose a mathematically differentiable NMS (GrooMeD-NMS). To improve generalization to new datasets, we explore depth equivariant (DEVIANT) backbones. We address the issue of large object detection, demonstrating that it's not solely a data imbalance or receptive field problem but also a noise sensitivity issue. To mitigate this, we introduce a segmentation-based approach in bird's-eye view with dice loss (SeaBird). Finally, we mathematically analyze the extrapolation of Mono3D models to unseen camera heights and improve Mono3D generalization in such out-of-distribution settings.
Similar Papers
Difficulty-Aware Label-Guided Denoising for Monocular 3D Object Detection
CV and Pattern Recognition
Helps cars see better in 3D, even when objects are hidden.
Towards 3D Objectness Learning in an Open World
CV and Pattern Recognition
Finds any object in 3D, even new ones.
GATE3D: Generalized Attention-based Task-synergized Estimation in 3D*
CV and Pattern Recognition
Helps robots see in 3D everywhere, not just roads.