Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets
By: Sebastian Bruch , Franco Maria Nardini , Cosimo Rulli and more
Potential Business Impact:
Finds similar items faster using smart computer tricks.
Sparse embeddings of data form an attractive class due to their inherent interpretability: Every dimension is tied to a term in some vocabulary, making it easy to visually decipher the latent space. Sparsity, however, poses unique challenges for Approximate Nearest Neighbor Search (ANNS) which finds, from a collection of vectors, the k vectors closest to a query. To encourage research on this underexplored topic, sparse ANNS featured prominently in a BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on large benchmark datasets by throughput and accuracy. In this work, we introduce a set of novel data structures and algorithmic methods, a combination of which leads to an elegant, effective, and highly efficient solution to sparse ANNS. Our contributions range from a theoretically-grounded sketching algorithm for sparse vectors to reduce their effective dimensionality while preserving inner product-induced ranks; a geometric organization of the inverted index; and the blending of local and global information to improve the efficiency and efficacy of ANNS. Empirically, our final algorithm, dubbed Seismic, reaches sub-millisecond per-query latency with high accuracy on a large-scale benchmark dataset using a single CPU.
Similar Papers
Approximate Nearest Neighbor Search of Large Scale Vectors on Distributed Storage
Databases
Finds similar items in huge online lists faster.
On Storage Neural Network Augmented Approximate Nearest Neighbor Search
Machine Learning (CS)
Finds similar data faster from slow storage.
Scalable Disk-Based Approximate Nearest Neighbor Search with Page-Aligned Graph
Machine Learning (CS)
Finds information faster on huge computer files.