LMG Index: A Robust Learned Index for Multi-Dimensional Performance Balance
By: Yuzhen Chen, Bin Yao
Index structures are fundamental for efficient query processing on large-scale datasets. Learned indexes model the indexing process as a prediction problem to overcome the inherent trade-offs of traditional indexes. However, most existing learned indexes optimize only for limited objectives like query latency or space usage, neglecting other practical evaluation dimensions such as update efficiency and stability. Moreover, many learned indexes rely on assumptions about data distributions or workloads, lacking theoretical guarantees when facing unknown or evolving scenarios, which limits their generality in real-world systems. In this paper, we propose LMIndex, a robust framework for learned indexing that leverages a efficient query/update top-layer structure (theoretically $O(1)$ when the key type is fixed) and a efficient optimal error threshold training algorithm (approach $O(1)$ in practice). Building upon this, we develop LMG (LMIndex with gaps), a variant employing a novel gap allocation strategy to enhance update performance and maintain stability under dynamic workloads. Extensive evaluations show that LMG achieves competitive or leading performance, including bulk loading (up to 8.25$\times$ faster), point queries (up to 1.49$\times$ faster), range queries (up to 4.02$\times$ faster than B+Tree), update (up to 1.5$\times$ faster on read-write workloads), stability (up to 82.59$\times$ lower coefficient of variation), and space usage (up to 1.38$\times$ smaller). These results demonstrate that LMG effectively breaks the multi-dimensional performance trade-offs inherent in state-of-the-art approaches, offering a balanced and versatile framework.
Similar Papers
HIRE: A Hybrid Learned Index for Robust and Efficient Performance under Mixed Workloads
Databases
Makes computer searches faster and more reliable.
Evaluating Learned Indexes in LSM-tree Systems: Benchmarks,Insights and Design Choices
Databases
Helps computers find data faster in big lists.
Dynamic Indexing Through Learned Indices with Worst-case Guarantees
Computational Geometry
Makes searching data much faster, even with changes.