Score: 1

From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models

Published: June 2, 2025 | arXiv ID: 2506.02242v2

By: Yihong Tang , Ao Qu , Xujing Yu and more

Potential Business Impact:

Finds hidden road dangers from pictures.

Business Areas:

Autonomous Vehicles Transportation

Urban and transportation research has long sought to uncover statistically meaningful relationships between key variables and societal outcomes such as road safety, to generate actionable insights that guide the planning, development, and renewal of urban and transportation systems. However, traditional workflows face several key challenges: (1) reliance on human experts to propose hypotheses, which is time-consuming and prone to confirmation bias; (2) limited interpretability, particularly in deep learning approaches; and (3) underutilization of unstructured data that can encode critical urban context. Given these limitations, we propose a Multimodal Large Language Model (MLLM)-based approach for interpretable hypothesis inference, enabling the automated generation, evaluation, and refinement of hypotheses concerning urban context and road safety outcomes. Our method leverages MLLMs to craft safety-relevant questions for street view images (SVIs), extract interpretable embeddings from their responses, and apply them in regression-based statistical models. UrbanX supports iterative hypothesis testing and refinement, guided by statistical evidence such as coefficient significance, thereby enabling rigorous scientific discovery of previously overlooked correlations between urban design and safety. Experimental evaluations on Manhattan street segments demonstrate that our approach outperforms pretrained deep learning models while offering full interpretability. Beyond road safety, UrbanX can serve as a general-purpose framework for urban scientific discovery, extracting structured insights from unstructured urban data across diverse socioeconomic and environmental outcomes. This approach enhances model trustworthiness for policy applications and establishes a scalable, statistically grounded pathway for interpretable knowledge discovery in urban and transportation studies.

Large Language Models and Their Applications in Roadway Safety and Mobility Enhancement: A Comprehensive Review

Artificial Intelligence

Helps cars understand traffic better for safer roads.

19 May 2025 1

91%

When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis

CV and Pattern Recognition

Helps cars spot accidents in videos faster.

17 Jan 2025 1

91%

Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends

CV and Pattern Recognition

Makes cars see and understand everything around them.

21 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇦 Canada

Repos / Data Links

github.com github.com

Page Count

24 pages

From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models

Finds hidden road dangers from pictures.

Technical Abstract

Large Language Models and Their Applications in Roadway Safety and Mobility Enhancement: A Comprehensive Review

When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis

Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends