Score: 0

Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture

Published: December 12, 2025 | arXiv ID: 2512.11350v1

By: Tanu Singh, Pranamesh Chakraborty, Long T. Truong

Potential Business Impact:

Helps cameras spot car crashes faster and better.

Business Areas:

Image Recognition Data and Analytics, Software

Road traffic accidents represent a leading cause of mortality globally, with incidence rates rising due to increasing population, urbanization, and motorization. Rising accident rates raise concerns about traffic surveillance effectiveness. Traditional computer vision methods for accident detection struggle with limited spatiotemporal understanding and poor cross-domain generalization. Recent advances in transformer architectures excel at modeling global spatial-temporal dependencies and parallel computation. However, applying these models to automated traffic accident detection is limited by small, non-diverse datasets, hindering the development of robust, generalizable systems. To address this gap, we curated a comprehensive and balanced dataset that captures a wide spectrum of traffic environments, accident types, and contextual variations. Utilizing the curated dataset, we propose an accident detection model based on a transformer architecture using pre-extracted spatial video features. The architecture employs convolutional layers to extract local correlations across diverse patterns within a frame, while leveraging transformers to capture sequential-temporal dependencies among the retrieved features. Moreover, most existing studies neglect the integration of motion cues, which are essential for understanding dynamic scenes, especially during accidents. These approaches typically rely on static features or coarse temporal information. In this study, multiple methods for incorporating motion cues were evaluated to identify the most effective strategy. Among the tested input approaches, concatenating RGB features with optical flow achieved the highest accuracy at 88.3%. The results were further compared with vision language models (VLM) such as GPT, Gemini, and LLaVA-NeXT-Video to assess the effectiveness of the proposed method.

Integrating Generative Adversarial Networks and Convolutional Neural Networks for Enhanced Traffic Accidents Detection and Analysis

CV and Pattern Recognition

Spots car crashes in videos to help save lives.

19 Jun 2025 1

90%

Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods, Datasets, and Future Directions

CV and Pattern Recognition

Helps cars predict crashes before they happen.

12 May 2025 0

90%

Automated Road Distress Detection Using Vision Transformersand Generative Adversarial Networks

CV and Pattern Recognition

Finds road cracks faster using smart computer eyes.

17 Nov 2025 2

View PDF Login to Bookmark

Country of Origin

🇮🇳 India

Page Count

12 pages

Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture

Helps cameras spot car crashes faster and better.

Technical Abstract

Integrating Generative Adversarial Networks and Convolutional Neural Networks for Enhanced Traffic Accidents Detection and Analysis

Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods, Datasets, and Future Directions

Automated Road Distress Detection Using Vision Transformersand Generative Adversarial Networks