Score: 1

Online 3D Multi-Camera Perception through Robust 2D Tracking and Depth-based Late Aggregation

Published: September 12, 2025 | arXiv ID: 2509.09946v1

By: Vu-Minh Le , Thao-Anh Tran , Duc Huy Do and more

Potential Business Impact:

Tracks people in 3D from many cameras.

Business Areas:
Motion Capture Media and Entertainment, Video

Multi-Target Multi-Camera Tracking (MTMC) is an essential computer vision task for automating large-scale surveillance. With camera calibration and depth information, the targets in the scene can be projected into 3D space, offering unparalleled levels of automatic perception of a 3D environment. However, tracking in the 3D space requires replacing all 2D tracking components from the ground up, which may be infeasible for existing MTMC systems. In this paper, we present an approach for extending any online 2D multi-camera tracking system into 3D space by utilizing depth information to reconstruct a target in point-cloud space, and recovering its 3D box through clustering and yaw refinement following tracking. We also introduced an enhanced online data association mechanism that leverages the target's local ID consistency to assign global IDs across frames. The proposed framework is evaluated on the 2025 AI City Challenge's 3D MTMC dataset, achieving 3rd place on the leaderboard.

Page Count
11 pages

Category
Computer Science:
CV and Pattern Recognition