The World As Large Language Models See It: Exploring the reliability of LLMs in representing geographical features
By: Omid Reza Abbasi , Franz Welscher , Georg Weinberger and more
Potential Business Impact:
Computers guess locations, but not always right.
As large language models (LLMs) continue to evolve, questions about their trustworthiness in delivering factual information have become increasingly important. This concern also applies to their ability to accurately represent the geographic world. With recent advancements in this field, it is relevant to consider whether and to what extent LLMs' representations of the geographical world can be trusted. This study evaluates the performance of GPT-4o and Gemini 2.0 Flash in three key geospatial tasks: geocoding, elevation estimation, and reverse geocoding. In the geocoding task, both models exhibited systematic and random errors in estimating the coordinates of St. Anne's Column in Innsbruck, Austria, with GPT-4o showing greater deviations and Gemini 2.0 Flash demonstrating more precision but a significant systematic offset. For elevation estimation, both models tended to underestimate elevations across Austria, though they captured overall topographical trends, and Gemini 2.0 Flash performed better in eastern regions. The reverse geocoding task, which involved identifying Austrian federal states from coordinates, revealed that Gemini 2.0 Flash outperformed GPT-4o in overall accuracy and F1-scores, demonstrating better consistency across regions. Despite these findings, neither model achieved an accurate reconstruction of Austria's federal states, highlighting persistent misclassifications. The study concludes that while LLMs can approximate geographic information, their accuracy and reliability are inconsistent, underscoring the need for fine-tuning with geographical information to enhance their utility in GIScience and Geoinformatics.
Similar Papers
Evaluating Large Language Model Capabilities in Assessing Spatial Econometrics Research
Computers and Society
AI checks if science papers make economic sense.
GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks
Computation and Language
Helps computers understand maps and locations better.
Evaluation of LLMs for mathematical problem solving
Artificial Intelligence
Computers solve harder math problems better.