Subnational Geocoding of Global Disasters Using Large Language Models
By: Michele Ronco , Damien Delforge , Wiebke S. Jäger and more
Potential Business Impact:
Maps disaster locations automatically and accurately.
Subnational location data of disaster events are critical for risk assessment and disaster risk reduction. Disaster databases such as EM-DAT often report locations in unstructured textual form, with inconsistent granularity or spelling, that make it difficult to integrate with spatial datasets. We present a fully automated LLM-assisted workflow that processes and cleans textual location information using GPT-4o, and assigns geometries by cross-checking three independent geoinformation repositories: GADM, OpenStreetMap and Wikidata. Based on the agreement and availability of these sources, we assign a reliability score to each location while generating subnational geometries. Applied to the EM-DAT dataset from 2000 to 2024, the workflow geocodes 14,215 events across 17,948 unique locations. Unlike previous methods, our approach requires no manual intervention, covers all disaster types, enables cross-verification across multiple sources, and allows flexible remapping to preferred frameworks. Beyond the dataset, we demonstrate the potential of LLMs to extract and structure geographic information from unstructured text, offering a scalable and reliable method for related analyses.
Similar Papers
The World As Large Language Models See It: Exploring the reliability of LLMs in representing geographical features
Computers and Society
Computers guess locations, but not always right.
RoadMind: Towards a Geospatial AI Expert for Disaster Response
Computation and Language
Helps AI understand maps for disaster help.
Benchmarking Large Language Models for Geolocating Colonial Virginia Land Grants
Machine Learning (CS)
Turns old land descriptions into maps.