Filtered Approximate Nearest Neighbor Search: A Unified Benchmark and Systematic Experimental Study [Experiment, Analysis & Benchmark]
By: Jiayang Shi, Yuzheng Cai, Weiguo Zheng
Potential Business Impact:
Finds best matches faster, even with rules.
For a given dataset $\mathcal{D}$ and structured label $f$, the goal of Filtered Approximate Nearest Neighbor Search (FANNS) algorithms is to find top-$k$ points closest to a query that satisfy label constraints, while ensuring both recall and QPS (Queries Per Second). In recent years, many FANNS algorithms have been proposed. However, the lack of a systematic investigation makes it difficult to understand their relative strengths and weaknesses. Additionally, we found that: (1) FANNS algorithms have coupled, dataset-dependent parameters, leading to biased comparisons. (2) Key impact factors are rarely analyzed systematically, leaving unclear when each algorithm performs well. (3) Disparate datasets, workloads, and biased experiment designs make cross-algorithm comparisons unreliable. Thus, a comprehensive survey and benchmark for FANNS is crucial to achieve the following goals: designing a fair evaluation and clarifying the classification of algorithms, conducting in-depth analysis of their performance, and establishing a unified benchmark. First, we propose a taxonomy (dividing methods into \textit{filter-then-search}, \textit{search-then-filter}, \textit{hybrid-search}) and a systematic evaluation framework, integrating unified parameter tuning and standardized filtering across algorithms to reduce implementation-induced performance variations and reflect core trade-offs. Then, we conduct a comprehensive empirical study to analyze how query difficulty and dataset properties impact performance, evaluating robustness under pressures like filter selectivity, Recall@k, and scalability to clarify each method's strengths. Finally, we establish a standardized benchmark with real-world datasets and open-source related resources to ensure reproducible future research.
Similar Papers
Survey of Filtered Approximate Nearest Neighbor Search over the Vector-Scalar Hybrid Data
Databases
Helps computers find similar items faster.
Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study
Databases
Finds similar items with specific rules.
Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study
Databases
Finds similar items with specific rules.