A Multi-Drone Multi-View Dataset and Deep Learning Framework for Pedestrian Detection and Tracking
By: Kosta Dakic , Kanchana Thilakarathna , Rodrigo N. Calheiros and more
Potential Business Impact:
Tracks people from many moving cameras.
Multi-drone surveillance systems offer enhanced coverage and robustness for pedestrian tracking, yet existing approaches struggle with dynamic camera positions and complex occlusions. This paper introduces MATRIX (Multi-Aerial TRacking In compleX environments), a comprehensive dataset featuring synchronized footage from eight drones with continuously changing positions, and a novel deep learning framework for multi-view detection and tracking. Unlike existing datasets that rely on static cameras or limited drone coverage, MATRIX provides a challenging scenario with 40 pedestrians and a significant architectural obstruction in an urban environment. Our framework addresses the unique challenges of dynamic drone-based surveillance through real-time camera calibration, feature-based image registration, and multi-view feature fusion in bird's-eye-view (BEV) representation. Experimental results demonstrate that while static camera methods maintain over 90\% detection and tracking precision and accuracy metrics in a simplified MATRIX environment without an obstruction, 10 pedestrians and a much smaller observational area, their performance significantly degrades in the complex environment. Our proposed approach maintains robust performance with $\sim$90\% detection and tracking accuracy, as well as successfully tracks $\sim$80\% of trajectories under challenging conditions. Transfer learning experiments reveal strong generalization capabilities, with the pretrained model achieving much higher detection and tracking accuracy performance compared to training the model from scratch. Additionally, systematic camera dropout experiments reveal graceful performance degradation, demonstrating practical robustness for real-world deployments where camera failures may occur. The MATRIX dataset and framework provide essential benchmarks for advancing dynamic multi-view surveillance systems.
Similar Papers
Multi-View 3D Point Tracking
CV and Pattern Recognition
Tracks moving things in 3D with few cameras.
Vision-based Perception System for Automated Delivery Robot-Pedestrians Interactions
Robotics
Helps robots safely navigate crowded sidewalks.
Attention-Aware Multi-View Pedestrian Tracking
CV and Pattern Recognition
Tracks people better even when they hide.