Algorithmic Complexity Attacks on All Learned Cardinality Estimators: A Data-centric Approach
By: Yingze Li , Xianglong Liu , Dong Wang and more
Potential Business Impact:
Makes computer programs that guess data wrong easily.
Learned cardinality estimators show promise in query cardinality prediction, yet they universally exhibit fragility to training data drifts, posing risks for real-world deployment. This work is the first to theoretical investigate how minimal data-level drifts can maximally degrade the accuracy of learned estimators. We propose data-centric algorithmic complexity attacks against learned estimators in a black-box setting, proving that finding the optimal attack strategy is NP-Hard. To address this, we design a polynomial-time approximation algorithm with a $(1-\kappa)$ approximation ratio. Extensive experiments demonstrate our attack's effectiveness: on STATS-CEB and IMDB-JOB benchmarks, modifying just 0.8\% of training tuples increases the 90th percentile Qerror by three orders of magnitude and raises end-to-end processing time by up to 20$\times$. Our work not only reveals critical vulnerabilities in deployed learned estimators but also provides the first unified worst-case theoretical analysis of their fragility under data updates. Additionally, we identify two countermeasures to mitigate such black-box attacks, offering insights for developing robust learned database optimizers.
Similar Papers
TiCard: Deployable EXPLAIN-only Residual Learning for Cardinality Estimation
Artificial Intelligence
Makes computer databases find information faster.
Data-Agnostic Cardinality Learning from Imperfect Workloads
Databases
Helps computers guess data size without seeing data.
Is it Bigger than a Breadbox: Efficient Cardinality Estimation for Real World Workloads
Databases
Makes computer searches faster and smarter.