Human Mobility Datasets Enriched With Contextual and Social Dimensions
By: Chiara Pugliese , Francesco Lettich , Guido Rocchietti and more
Potential Business Impact:
Maps where people go and what they do.
In this resource paper, we present two publicly available datasets of semantically enriched human trajectories, together with the pipeline to build them. The trajectories are publicly available GPS traces retrieved from OpenStreetMap. Each dataset includes contextual layers such as stops, moves, points of interest (POIs), inferred transportation modes, and weather data. A novel semantic feature is the inclusion of synthetic, realistic social media posts generated by Large Language Models (LLMs), enabling multimodal and semantic mobility analysis. The datasets are available in both tabular and Resource Description Framework (RDF) formats, supporting semantic reasoning and FAIR data practices. They cover two structurally distinct, large cities: Paris and New York. Our open source reproducible pipeline allows for dataset customization, while the datasets support research tasks such as behavior modeling, mobility prediction, knowledge graph construction, and LLM-based applications. To our knowledge, our resource is the first to combine real-world movement, structured semantic enrichment, LLM-generated text, and semantic web compatibility in a reusable framework.
Similar Papers
Learning Universal Human Mobility Patterns with a Foundation Model for Cross-domain Data Fusion
Machine Learning (CS)
Helps cities plan roads and traffic better.
WorldMove, a global open data for human mobility
Social and Information Networks
Creates fake travel maps for cities worldwide.
Estimating link level traffic emissions: enhancing MOVES with open-source data
Machine Learning (CS)
**Cleans up car pollution estimates.**