LiLIS: Enhancing Big Spatial Data Processing with Lightweight Distributed Learned Index
By: Zhongpu Chen , Wanjun Hao , Ziang Zeng and more
Potential Business Impact:
Finds city data much faster than before.
The efficient management of big spatial data is crucial for location-based services, particularly in smart cities. However, existing systems such as Simba and Sedona, which incorporate distributed spatial indexing, still incur substantial index construction overheads, rendering them far from optimal for real-time analytics. Recent studies demonstrate that learned indices can achieve high efficiency through well-designed machine learning models, but how to design a learned index for distributed spatial analytics remains unaddressed. In this paper, we present LiLIS, a Lightweight Distributed Learned Index for big spatial data. LiLIS combines machine-learned search strategies with spatial-aware partitioning within a distributed framework, and efficiently implements common spatial queries, including point query, range query, k-nearest neighbors (kNN), and spatial joins. Extensive experimental results over real-world and synthetic datasets show that LiLIS outperforms state-of-the-art big spatial data analytics by $2-3$ orders of magnitude for most spatial queries, and the index building achieves $1.5-2\times$ speed-up. The code is available at https://github.com/SWUFE-DB-Group/learned-index-spark.
Similar Papers
Benchmarking RL-Enhanced Spatial Indices Against Traditional, Advanced, and Learned Counterparts
Databases
Makes computer searches faster, but not always the best.
Unlocking Location Intelligence: A Survey from Deep Learning to The LLM Era
CV and Pattern Recognition
Helps computers understand maps and text together.
SOLAR: Scalable Distributed Spatial Joins through Learning-based Optimization
Databases
Makes finding map data much faster.