How to Build Robust, Scalable Models for GSV-Based Indicators in Neighborhood Research
By: Xiaoya Tang , Xiaohe Yue , Heran Mane and more
A substantial body of health research demonstrates a strong link between neighborhood environments and health outcomes. Recently, there has been increasing interest in leveraging advances in computer vision to enable large-scale, systematic characterization of neighborhood built environments. However, the generalizability of vision models across fundamentally different domains remains uncertain, for example, transferring knowledge from ImageNet to the distinct visual characteristics of Google Street View (GSV) imagery. In applied fields such as social health research, several critical questions arise: which models are most appropriate, whether to adopt unsupervised training strategies, what training scale is feasible under computational constraints, and how much such strategies benefit downstream performance. These decisions are often costly and require specialized expertise. In this paper, we answer these questions through empirical analysis and provide practical insights into how to select and adapt foundation models for datasets with limited size and labels, while leveraging larger, unlabeled datasets through unsupervised training. Our study includes comprehensive quantitative and visual analyses comparing model performance before and after unsupervised adaptation.
Similar Papers
From Pixels to Predicates Structuring urban perception with scene graphs
CV and Pattern Recognition
Makes computers understand how people feel about places.
Vehicle detection from GSV imagery: Predicting travel behaviour for cycling and motorcycling using Computer Vision
CV and Pattern Recognition
Shows how many people bike or ride motorcycles.
Silhouette-based Gait Foundation Model
CV and Pattern Recognition
Lets computers recognize people by how they walk.