MA-DPR: Manifold-aware Distance Metrics for Dense Passage Retrieval
By: Yifan Liu , Qianfeng Wen , Mark Zhao and more
Potential Business Impact:
Finds answers even when words don't match.
Dense Passage Retrieval (DPR) typically relies on Euclidean or cosine distance to measure query-passage relevance in embedding space, which is effective when embeddings lie on a linear manifold. However, our experiments across DPR benchmarks suggest that embeddings often lie on lower-dimensional, non-linear manifolds, especially in out-of-distribution (OOD) settings, where cosine and Euclidean distance fail to capture semantic similarity. To address this limitation, we propose a manifold-aware distance metric for DPR (MA-DPR) that models the intrinsic manifold structure of passages using a nearest neighbor graph and measures query-passage distance based on their shortest path in this graph. We show that MA-DPR outperforms Euclidean and cosine distances by up to 26% on OOD passage retrieval with comparable in-distribution performance across various embedding models while incurring a minimal increase in query inference time. Empirical evidence suggests that manifold-aware distance allows DPR to leverage context from related neighboring passages, making it effective even in the absence of direct semantic overlap. MADPR can be applied to a wide range of dense embedding and retrieval tasks, offering potential benefits across a wide spectrum of domains.
Similar Papers
Improving Dense Passage Retrieval with Multiple Positive Passages
Information Retrieval
Finds better answers by using more examples.
Dense Passage Retrieval in Conversational Search
Information Retrieval
Finds answers in conversations better.
MPAD: A New Dimension-Reduction Method for Preserving Nearest Neighbors in High-Dimensional Vector Search
Information Retrieval
Makes computer searches faster and more accurate.