Lower Bounds for the Algorithmic Complexity of Learned Indexes
By: Luis Alberto Croquevielle, Roman Sokolovskii, Thomas Heinis
Potential Business Impact:
Makes computer searches faster by learning data patterns.
Learned index structures aim to accelerate queries by training machine learning models to approximate the rank function associated with a database attribute. While effective in practice, their theoretical limitations are not fully understood. We present a general framework for proving lower bounds on query time for learned indexes, expressed in terms of their space overhead and parameterized by the model class used for approximation. Our formulation captures a broad family of learned indexes, including most existing designs, as piecewise model-based predictors. We solve the problem of lower bounding query time in two steps: first, we use probabilistic tools to control the effect of sampling when the database attribute is drawn from a probability distribution. Then, we analyze the approximation-theoretic problem of how to optimally represent a cumulative distribution function with approximators from a given model class. Within this framework, we derive lower bounds under a range of modeling and distributional assumptions, paying particular attention to the case of piecewise linear and piecewise constant model classes, which are common in practical implementations. Our analysis shows how tools from approximation theory, such as quantization and Kolmogorov widths, can be leveraged to formalize the space-time tradeoffs inherent to learned index structures. The resulting bounds illuminate core limitations of these methods.
Similar Papers
Dynamic Indexing Through Learned Indices with Worst-case Guarantees
Computational Geometry
Makes searching data much faster, even with changes.
Dimension lower bounds for linear approaches to function approximation
Machine Learning (CS)
Finds how much data computers need to learn.
Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions
Machine Learning (CS)
Makes hard computer learning tasks easy.