CREAM: Continual Retrieval on Dynamic Streaming Corpora with Adaptive Soft Memory
By: HuiJeong Son , Hyeongu Kang , Sunho Kim and more
Potential Business Impact:
Helps computers learn new topics without answers.
Information retrieval (IR) in dynamic data streams is emerging as a challenging task, as shifts in data distribution degrade the performance of AI-powered IR systems. To mitigate this issue, memory-based continual learning has been widely adopted for IR. However, existing methods rely on a fixed set of queries with ground-truth relevant documents, which limits generalization to unseen queries and documents, making them impractical for real-world applications. To enable more effective learning with unseen topics of a new corpus without ground-truth labels, we propose CREAM, a self-supervised framework for memory-based continual retrieval. CREAM captures the evolving semantics of streaming queries and documents into dynamically structured soft memory and leverages it to adapt to both seen and unseen topics in an unsupervised setting. We realize this through three key techniques: fine-grained similarity estimation, regularized cluster prototyping, and stratified coreset sampling. Experiments on two benchmark datasets demonstrate that CREAM exhibits superior adaptability and retrieval accuracy, outperforming the strongest method in a label-free setting by 27.79\% in Success@5 and 44.5\% in Recall@10 on average, and achieving performance comparable to or even exceeding that of supervised methods.
Similar Papers
CRAM: Large-scale Video Continual Learning with Bootstrapped Compression
CV and Pattern Recognition
Stores many videos using less computer memory.
Retrieval-Augmented Memory for Online Learning
Machine Learning (CS)
Helps computers learn from changing information.
Forget Forgetting: Continual Learning in a World of Abundant Memory
Machine Learning (CS)
Teaches computers new things without forgetting old ones.