Score: 1

UrbanSense:A Framework for Quantitative Analysis of Urban Streetscapes leveraging Vision Large Language Models

Published: June 12, 2025 | arXiv ID: 2506.10342v2

By: Jun Yin , Jing Zhong , Peilin Li and more

Potential Business Impact:

Helps computers see city differences like people.

Business Areas:

Smart Cities Real Estate

Urban cultures and architectural styles vary significantly across cities due to geographical, chronological, historical, and socio-political factors. Understanding these differences is essential for anticipating how cities may evolve in the future. As representative cases of historical continuity and modern innovation in China, Beijing and Shenzhen offer valuable perspectives for exploring the transformation of urban streetscapes. However, conventional approaches to urban cultural studies often rely on expert interpretation and historical documentation, which are difficult to standardize across different contexts. To address this, we propose a multimodal research framework based on vision-language models, enabling automated and scalable analysis of urban streetscape style differences. This approach enhances the objectivity and data-driven nature of urban form research. The contributions of this study are as follows: First, we construct UrbanDiffBench, a curated dataset of urban streetscapes containing architectural images from different periods and regions. Second, we develop UrbanSense, the first vision-language-model-based framework for urban streetscape analysis, enabling the quantitative generation and comparison of urban style representations. Third, experimental results show that Over 80% of generated descriptions pass the t-test (p less than 0.05). High Phi scores (0.912 for cities, 0.833 for periods) from subjective evaluations confirm the method's ability to capture subtle stylistic differences. These results highlight the method's potential to quantify and interpret urban style evolution, offering a scientifically grounded lens for future design.

ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models

CV and Pattern Recognition

Computer sees building styles like an expert.

9 Jun 2025 0

90%

CityLens: Benchmarking Large Language-Vision Models for Urban Socioeconomic Sensing

Artificial Intelligence

Helps computers understand city life from pictures.

31 May 2025 1

89%

Do Vision-Language Models See Urban Scenes as People Do? An Urban Perception Benchmark

CV and Pattern Recognition

Helps AI understand city pictures like people do.

18 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇸🇬 🇨🇳 Singapore, China

Page Count

10 pages

UrbanSense:A Framework for Quantitative Analysis of Urban Streetscapes leveraging Vision Large Language Models

Helps computers see city differences like people.

Technical Abstract

ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models

CityLens: Benchmarking Large Language-Vision Models for Urban Socioeconomic Sensing

Do Vision-Language Models See Urban Scenes as People Do? An Urban Perception Benchmark