Score: 1

Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking

Published: October 11, 2025 | arXiv ID: 2510.10287v1

By: Markus Käppeler , Özgün Çiçek , Daniele Cattaneo and more

Potential Business Impact:

Helps self-driving cars see better in 3D.

Business Areas:

Image Recognition Data and Analytics, Software

Camera-based 3D object detection and tracking are essential for perception in autonomous driving. Current state-of-the-art approaches often rely exclusively on either perspective-view (PV) or bird's-eye-view (BEV) features, limiting their ability to leverage both fine-grained object details and spatially structured scene representations. In this work, we propose DualViewDistill, a hybrid detection and tracking framework that incorporates both PV and BEV camera image features to leverage their complementary strengths. Our approach introduces BEV maps guided by foundation models, leveraging descriptive DINOv2 features that are distilled into BEV representations through a novel distillation process. By integrating PV features with BEV maps enriched with semantic and geometric features from DINOv2, our model leverages this hybrid representation via deformable aggregation to enhance 3D object detection and tracking. Extensive experiments on the nuScenes and Argoverse 2 benchmarks demonstrate that DualViewDistill achieves state-of-the-art performance. The results showcase the potential of foundation model BEV maps to enable more reliable perception for autonomous driving. We make the code and pre-trained models available at https://dualviewdistill.cs.uni-freiburg.de .

BEVCon: Advancing Bird's Eye View Perception with Contrastive Learning

CV and Pattern Recognition

Helps self-driving cars see better from above.

6 Aug 2025 2

90%

BEV-ODOM2: Enhanced BEV-based Monocular Visual Odometry with PV-BEV Fusion and Dense Flow Supervision for Ground Robots

Robotics

Helps cars see where they are going.

18 Sep 2025 1

90%

An Initial Study of Bird's-Eye View Generation for Autonomous Vehicles using Cross-View Transformers

CV and Pattern Recognition

Helps self-driving cars see roads from above.

17 Aug 2025 0

View PDF Login to Bookmark

Page Count

16 pages

Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking

Helps self-driving cars see better in 3D.

Technical Abstract

BEVCon: Advancing Bird's Eye View Perception with Contrastive Learning

BEV-ODOM2: Enhanced BEV-based Monocular Visual Odometry with PV-BEV Fusion and Dense Flow Supervision for Ground Robots

An Initial Study of Bird's-Eye View Generation for Autonomous Vehicles using Cross-View Transformers