Score: 0

From Static to Dynamic: Evaluating the Perceptual Impact of Dynamic Elements in Urban Scenes Using Generative Inpainting

Published: December 30, 2025 | arXiv ID: 2512.24513v1

By: Zhiwei Wei , Mengzi Zhang , Boyan Lu and more

Potential Business Impact:

Makes cities seem less lively without people.

Business Areas:
Visual Search Internet Services

Understanding urban perception from street view imagery has become a central topic in urban analytics and human centered urban design. However, most existing studies treat urban scenes as static and largely ignore the role of dynamic elements such as pedestrians and vehicles, raising concerns about potential bias in perception based urban analysis. To address this issue, we propose a controlled framework that isolates the perceptual effects of dynamic elements by constructing paired street view images with and without pedestrians and vehicles using semantic segmentation and MLLM guided generative inpainting. Based on 720 paired images from Dongguan, China, a perception experiment was conducted in which participants evaluated original and edited scenes across six perceptual dimensions. The results indicate that removing dynamic elements leads to a consistent 30.97% decrease in perceived vibrancy, whereas changes in other dimensions are more moderate and heterogeneous. To further explore the underlying mechanisms, we trained 11 machine learning models using multimodal visual features and identified that lighting conditions, human presence, and depth variation were key factors driving perceptual change. At the individual level, 65% of participants exhibited significant vibrancy changes, compared with 35-50% for other dimensions; gender further showed a marginal moderating effect on safety perception. Beyond controlled experiments, the trained model was extended to a city-scale dataset to predict vibrancy changes after the removal of dynamic elements. The city level results reveal that such perceptual changes are widespread and spatially structured, affecting 73.7% of locations and 32.1% of images, suggesting that urban perception assessments based solely on static imagery may substantially underestimate urban liveliness.

Country of Origin
šŸ‡ØšŸ‡³ China

Page Count
31 pages

Category
Computer Science:
Computers and Society