CoLSE: A Lightweight and Robust Hybrid Learned Model for Single-Table Cardinality Estimation using Joint CDF
By: Lankadinee Rathuwadu , Guanli Liu , Christopher Leckie and more
Cardinality estimation (CE), the task of predicting the result size of queries is a critical component of query optimization. Accurate estimates are essential for generating efficient query execution plans. Recently, machine learning techniques have been applied to CE, broadly categorized into query-driven and data-driven approaches. Data-driven methods learn the joint distribution of data, while query-driven methods construct regression models that map query features to cardinalities. Ideally, a CE technique should strike a balance among three key factors: accuracy, efficiency, and memory footprint. However, existing state-of-the-art models often fail to achieve this balance. To address this, we propose CoLSE, a hybrid learned approach for single-table cardinality estimation. CoLSE directly models the joint probability over queried intervals using a novel algorithm based on copula theory and integrates a lightweight neural network to correct residual estimation errors. Experimental results show that CoLSE achieves a favorable trade-off among accuracy, training time, inference latency, and model size, outperforming existing state-of-the-art methods.
Similar Papers
A Lightweight Learned Cardinality Estimation Model
Databases
Makes computer databases guess answers faster and better.
CUBE: A Cardinality Estimator Based on Neural CDF
Databases
Makes computer searches faster and more reliable.
Forgetting by Pruning: Data Deletion in Join Cardinality Estimation
Databases
Cleans computer data without slowing it down.