SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization
By: Shravan Chaudhari , Rahul Thomas Jacob , Mononito Goswami and more
Retrieving code units (e.g., files, classes, functions) that are semantically relevant to a given user query, bug report, or feature request from large codebases is a fundamental challenge for LLM-based coding agents. Agentic approaches typically employ sparse retrieval methods like BM25 or dense embedding strategies to identify relevant units. While embedding-based approaches can outperform BM25 by large margins, they often lack exploration of the codebase and underutilize its underlying graph structure. To address this, we propose SpIDER (Spatially Informed Dense Embedding Retrieval), an enhanced dense retrieval approach that incorporates LLM-based reasoning over auxiliary context obtained through graph-based exploration of the codebase. Empirical results show that SpIDER consistently improves dense retrieval performance across several programming languages.
Similar Papers
Semantic Search for Information Retrieval
Information Retrieval
Helps computers find information by understanding meaning.
Hierarchical Semantic Retrieval with Cobweb
Computation and Language
Organizes information so computers can find it better.
TaoSearchEmb: A Multi-Objective Reinforcement Learning Framework for Dense Retrieval in Taobao Search
Information Retrieval
Makes online shopping search find better items.