Graph-based Nearest Neighbors with Dynamic Updates via Random Walks
By: Nina Mishra , Yonatan Naamad , Tal Wagner and more
Approximate nearest neighbor search (ANN) is a common way to retrieve relevant search results, especially now in the context of large language models and retrieval augmented generation. One of the most widely used algorithms for ANN is based on constructing a multi-layer graph over the dataset, called the Hierarchical Navigable Small World (HNSW). While this algorithm supports insertion of new data, it does not support deletion of existing data. Moreover, deletion algorithms described by prior work come at the cost of increased query latency, decreased recall, or prolonged deletion time. In this paper, we propose a new theoretical framework for graph-based ANN based on random walks. We then utilize this framework to analyze a randomized deletion approach that preserves hitting time statistics compared to the graph before deleting the point. We then turn this theoretical framework into a deterministic deletion algorithm, and show that it provides better tradeoff between query latency, recall, deletion time, and memory usage through an extensive collection of experiments.
Similar Papers
B+ANN: A Fast Billion-Scale Disk-based Nearest-Neighbor Index
Databases
Finds information faster using smarter computer memory.
Accelerating High-Dimensional Nearest Neighbor Search with Dynamic Query Preference
Databases
Finds information faster when some things are searched more.
Approximate Nearest Neighbor Search of Large Scale Vectors on Distributed Storage
Databases
Finds similar items in huge online lists faster.