Score: 2

SINDI: an Efficient Index for Approximate Maximum Inner Product Search on Sparse Vectors

Published: September 10, 2025 | arXiv ID: 2509.08395v1

By: Ruoxuan Li , Xiaoyao Zhong , Jiabao Jin and more

Potential Business Impact:

Finds answers faster by organizing information better.

Business Areas:

RISC Hardware

Sparse vector Maximum Inner Product Search (MIPS) is crucial in multi-path retrieval for Retrieval-Augmented Generation (RAG). Recent inverted index-based and graph-based algorithms have achieved high search accuracy with practical efficiency. However, their performance in production environments is often limited by redundant distance computations and frequent random memory accesses. Furthermore, the compressed storage format of sparse vectors hinders the use of SIMD acceleration. In this paper, we propose the sparse inverted non-redundant distance index (SINDI), which incorporates three key optimizations: (i) Efficient Inner Product Computation: SINDI leverages SIMD acceleration and eliminates redundant identifier lookups, enabling batched inner product computation; (ii) Memory-Friendly Design: SINDI replaces random memory accesses to original vectors with sequential accesses to inverted lists, substantially reducing memory-bound latency. (iii) Vector Pruning: SINDI retains only the high-magnitude non-zero entries of vectors, improving query throughput while maintaining accuracy. We evaluate SINDI on multiple real-world datasets. Experimental results show that SINDI achieves state-of-the-art performance across datasets of varying scales, languages, and models. On the MsMarco dataset, when Recall@50 exceeds 99%, SINDI delivers single-thread query-per-second (QPS) improvements ranging from 4.2 to 26.4 times compared with SEISMIC and PyANNs. Notably, SINDI has been integrated into Ant Group's open-source vector search library, VSAG.

SINDI: an Efficient Index for Approximate Maximum Inner Product Search on Sparse Vectors

Databases

Finds information faster by organizing it better.

10 Sep 2025 3

85%

Maximum Inner Product is Query-Scaled Nearest Neighbor

Databases

Finds similar items faster in online stores.

10 Mar 2025 3

84%

Sparse identification of nonlinear dynamics with high accuracy and reliability under noisy conditions for applications to industrial systems

Systems and Control

Predicts complex engine behavior accurately, even with noise.

7 Mar 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇭🇰 China, Hong Kong

Page Count

13 pages

SINDI: an Efficient Index for Approximate Maximum Inner Product Search on Sparse Vectors

Finds answers faster by organizing information better.

Technical Abstract

SINDI: an Efficient Index for Approximate Maximum Inner Product Search on Sparse Vectors

Maximum Inner Product is Query-Scaled Nearest Neighbor

Sparse identification of nonlinear dynamics with high accuracy and reliability under noisy conditions for applications to industrial systems