Score: 0

PDAC: Efficient Coreset Selection for Continual Learning via Probability Density Awareness

Published: November 12, 2025 | arXiv ID: 2511.09487v1

By: Junqi Gao , Zhichang Guo , Dazhi Zhang and more

Potential Business Impact:

Makes computer learning remember better, faster.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Rehearsal-based Continual Learning (CL) maintains a limited memory buffer to store replay samples for knowledge retention, making these approaches heavily reliant on the quality of the stored samples. Current Rehearsal-based CL methods typically construct the memory buffer by selecting a representative subset (referred to as coresets), aiming to approximate the training efficacy of the full dataset with minimal storage overhead. However, mainstream Coreset Selection (CS) methods generally formulate the CS problem as a bi-level optimization problem that relies on numerous inner and outer iterations to solve, leading to substantial computational cost thus limiting their practical efficiency. In this paper, we aim to provide a more efficient selection logic and scheme for coreset construction. To this end, we first analyze the Mean Squared Error (MSE) between the buffer-trained model and the Bayes-optimal model through the perspective of localized error decomposition to investigate the contribution of samples from different regions to MSE suppression. Further theoretical and experimental analyses demonstrate that samples with high probability density play a dominant role in error suppression. Inspired by this, we propose the Probability Density-Aware Coreset (PDAC) method. PDAC leverages the Projected Gaussian Mixture (PGM) model to estimate each sample's joint density, enabling efficient density-prioritized buffer selection. Finally, we introduce the streaming Expectation Maximization (EM) algorithm to enhance the adaptability of PGM parameters to streaming data, yielding Streaming PDAC (SPDAC) for streaming scenarios. Extensive comparative experiments show that our methods outperforms other baselines across various CL settings while ensuring favorable efficiency.

Stable Coresets via Posterior Sampling: Aligning Induced and Full Loss Landscapes

Machine Learning (CS)

Trains computers faster and better with less data.

21 Nov 2025 2

87%

FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection

Machine Learning (Stat)

Makes AI learn faster, using less power.

22 Nov 2025 1

87%

Escaping Stability-Plasticity Dilemma in Online Continual Learning for Motion Forecasting via Synergetic Memory Rehearsal

Machine Learning (CS)

Keeps AI remembering old things while learning new.

27 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

17 pages

PDAC: Efficient Coreset Selection for Continual Learning via Probability Density Awareness

Makes computer learning remember better, faster.

Technical Abstract

Stable Coresets via Posterior Sampling: Aligning Induced and Full Loss Landscapes

FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection

Escaping Stability-Plasticity Dilemma in Online Continual Learning for Motion Forecasting via Synergetic Memory Rehearsal