Score: 0

Enhancing Traffic Incident Response through Sub-Second Temporal Localization with HybridMamba

Published: April 4, 2025 | arXiv ID: 2504.03235v3

By: Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

Potential Business Impact:

Finds car crashes in videos faster.

Business Areas:
Autonomous Vehicles Transportation

Traffic crash detection in long-form surveillance videos is essential for improving emergency response and infrastructure planning, yet remains difficult due to the brief and infrequent nature of crash events. We present \textbf{HybridMamba}, a novel architecture integrating visual transformers with state-space temporal modeling to achieve high-precision crash time localization. Our approach introduces multi-level token compression and hierarchical temporal processing to maintain computational efficiency without sacrificing temporal resolution. Evaluated on a large-scale dataset from the Iowa Department of Transportation, HybridMamba achieves a mean absolute error of \textbf{1.50 seconds} for 2-minute videos ($p<0.01$ compared to baselines), with \textbf{65.2%} of predictions falling within one second of the ground truth. It outperforms recent video-language models (e.g., TimeChat, VideoLLaMA-2) by up to 3.95 seconds while using significantly fewer parameters (3B vs. 13--72B). Our results demonstrate effective temporal localization across various video durations (2--40 minutes) and diverse environmental conditions, highlighting HybridMamba's potential for fine-grained temporal localization in traffic surveillance while identifying challenges that remain for extended deployment.

Country of Origin
🇺🇸 United States

Page Count
16 pages

Category
Computer Science:
CV and Pattern Recognition