Score: 0

Harnessing Rich Multi-Modal Data for Spatial-Temporal Homophily-Embedded Graph Learning Across Domains and Localities

Published: December 11, 2025 | arXiv ID: 2512.11178v1

By: Takuya Kurihana , Xiaojian Zhang , Wing Yee Au and more

Modern cities are increasingly reliant on data-driven insights to support decision making in areas such as transportation, public safety and environmental impact. However, city-level data often exists in heterogeneous formats, collected independently by local agencies with diverse objectives and standards. Despite their numerous, wide-ranging, and uniformly consumable nature, national-level datasets exhibit significant heterogeneity and multi-modality. This research proposes a heterogeneous data pipeline that performs cross-domain data fusion over time-varying, spatial-varying and spatial-varying time-series datasets. We aim to address complex urban problems across multiple domains and localities by harnessing the rich information over 50 data sources. Specifically, our data-learning module integrates homophily from spatial-varying dataset into graph-learning, embedding information of various localities into models. We demonstrate the generalizability and flexibility of the framework through five real-world observations using a variety of publicly accessible datasets (e.g., ride-share, traffic crash, and crime reports) collected from multiple cities. The results show that our proposed framework demonstrates strong predictive performance while requiring minimal reconfiguration when transferred to new localities or domains. This research advances the goal of building data-informed urban systems in a scalable way, addressing one of the most pressing challenges in smart city analytics.

Category
Computer Science:
Machine Learning (CS)