HIRE: A Hybrid Learned Index for Robust and Efficient Performance under Mixed Workloads
By: Xinyi Zhang , Liang Liang , Anastasia Ailamaki and more
Potential Business Impact:
Makes computer searches faster and more reliable.
Indexes are critical for efficient data retrieval and updates in modern databases. Recent advances in machine learning have led to the development of learned indexes, which model the cumulative distribution function of data to predict search positions and accelerate query processing. While learned indexes substantially outperform traditional structures for point lookups, they often suffer from high tail latency, suboptimal range query performance, and inconsistent effectiveness across diverse workloads. To address these challenges, this paper proposes HIRE, a hybrid in-memory index structure designed to deliver efficient performance consistently. HIRE combines the structural and performance robustness of traditional indexes with the predictive power of model-based prediction to reduce search overhead while maintaining worst-case stability. Specifically, it employs (1) hybrid leaf nodes adaptive to varying data distributions and workloads, (2) model-accelerated internal nodes augmented by log-based updates for efficient updates, (3) a nonblocking, cost-driven recalibration mechanism for dynamic data, and (4) an inter-level optimized bulk-loading algorithm accounting for leaf and internal-node errors. Experimental results on multiple real-world datasets demonstrate that HIRE outperforms both state-of-the-art learned indexes and traditional structures in range-query throughput, tail latency, and overall stability. Compared to state-of-the-art learned indexes and traditional indexes, HIRE achieves up to 41.7$\times$ higher throughput under mixed workloads, reduces tail latency by up to 98% across varying scenarios.
Similar Papers
Learned Adaptive Indexing
Databases
Learns to find information faster as you ask.
Evaluating Learned Indexes in LSM-tree Systems: Benchmarks,Insights and Design Choices
Databases
Helps computers find data faster in big lists.
On the Costs and Benefits of Learned Indexing for Dynamic High-Dimensional Data: Extended Version
Information Retrieval
Makes computer searches faster as data grows.