Score: 1

Forgetting by Pruning: Data Deletion in Join Cardinality Estimation

Published: November 25, 2025 | arXiv ID: 2511.20293v1

By: Chaowei He , Yuanjun Liu , Qingzhi Ma and more

Potential Business Impact:

Cleans computer data without slowing it down.

Business Areas:
Semantic Search Internet Services

Machine unlearning in learned cardinality estimation (CE) systems presents unique challenges due to the complex distributional dependencies in multi-table relational data. Specifically, data deletion, a core component of machine unlearning, faces three critical challenges in learned CE models: attribute-level sensitivity, inter-table propagation and domain disappearance leading to severe overestimation in multi-way joins. We propose Cardinality Estimation Pruning (CEP), the first unlearning framework specifically designed for multi-table learned CE systems. CEP introduces Distribution Sensitivity Pruning, which constructs semi-join deletion results and computes sensitivity scores to guide parameter pruning, and Domain Pruning, which removes support for value domains entirely eliminated by deletion. We evaluate CEP on state-of-the-art architectures NeuroCard and FACE across IMDB and TPC-H datasets. Results demonstrate CEP consistently achieves the lowest Q-error in multi-table scenarios, particularly under high deletion ratios, often outperforming full retraining. Furthermore, CEP significantly reduces convergence iterations, incurring negligible computational overhead of 0.3%-2.5% of fine-tuning time.

Country of Origin
🇨🇳 China

Repos / Data Links

Page Count
11 pages

Category
Computer Science:
Databases