Score: 3

Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval

Published: December 13, 2025 | arXiv ID: 2512.12458v1

By: Vihan Lakshman , Blaise Munyampirwa , Julian Shun and more

BigTech Affiliations: Massachusetts Institute of Technology Google

Potential Business Impact:

Finds best matches even with tons of data.

Business Areas:
Semantic Search Internet Services

Modern vector databases enable efficient retrieval over high-dimensional neural embeddings, powering applications from web search to retrieval-augmented generation. However, classical theory predicts such tasks should suffer from the curse of dimensionality, where distances between points become nearly indistinguishable, thereby crippling efficient nearest-neighbor search. We revisit this paradox through the lens of stability, the property that small perturbations to a query do not radically alter its nearest neighbors. Building on foundational results, we extend stability theory to three key retrieval settings widely used in practice: (i) multi-vector search, where we prove that the popular Chamfer distance metric preserves single-vector stability, while average pooling aggregation may destroy it; (ii) filtered vector search, where we show that sufficiently large penalties for mismatched filters can induce stability even when the underlying search is unstable; and (iii) sparse vector search, where we formalize and prove novel sufficient stability conditions. Across synthetic and real datasets, our experimental results match our theoretical predictions, offering concrete guidance for model and system design to avoid the curse of dimensionality.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
27 pages

Category
Computer Science:
Information Retrieval