Non-Linear Determinants of Pedestrian Injury Severity: Evidence from Administrative Data in Great Britain
By: Yifei Tong
Potential Business Impact:
Finds why car crashes hurt pedestrians more.
This study investigates the non-linear determinants of pedestrian injury severity using administrative data from Great Britain's 2023 STATS19 dataset. To address inherent data-quality challenges, including missing information and substantial class imbalance, we employ a rigorous preprocessing pipeline utilizing mode imputation and Synthetic Minority Over-sampling (SMOTE). We utilize non-parametric ensemble methods (Random Forest and XGBoost) to capture complex interactions and heterogeneity often missed by linear models, while Shapley Additive Explanations are employed to ensure interpretability and isolate marginal feature effects. Our analysis reveals that vehicle count, speed limits, lighting, and road surface conditions are the primary predictors of severity, with police attendance and junction characteristics further distinguishing severe collisions. Spatially, while pedestrian risk is concentrated in dense urban Local Authority Districts (LADs), we identify that certain rural LADs experience disproportionately severe outcomes conditional on a collision occurring. These findings underscore the value of combining spatial analysis with interpretable machine learning to guide geographically targeted speed management, infrastructure investment, and enforcement strategies.
Similar Papers
Predicting and Explaining Traffic Crash Severity Through Crash Feature Selection
Machine Learning (CS)
Finds what makes car crashes worse.
Modeling Chaotic Pedestrian Behavior Using Chaos Indicators and Supervised Learning
Machine Learning (CS)
Predicts how people walk to make cities safer.
Monthly Rural-Urban Scaling of Road Accidents in England, Wales and Scotland (2019-2023)
Physics and Society
Shows how city growth causes more crashes.