Score: 1

BD-Index: Scalable Biharmonic Distance Queries on Large Graphs via Divide-and-Conquer Indexing

Published: December 2, 2025 | arXiv ID: 2512.02929v1

By: Yueyang Pan, Meihao Liao, Rong-Hua Li

Potential Business Impact:

Finds important connections in networks faster.

Business Areas:
Big Data Data and Analytics

Biharmonic distance (\bd) is a powerful graph distance metric with many applications, including identifying critical links in road networks and mitigating over-squashing problem in \gnn. However, computing \bd\ is extremely difficult, especially on large graphs. In this paper, we focus on the problem of \emph{single-pair} \bd\ query. Existing methods mainly rely on random walk-based approaches, which work well on some graphs but become inefficient when the random walk cannot mix rapidly.To overcome this issue, we first show that the biharmonic distance between two nodes $s,t$, denoted by $b(s,t)$, can be interpreted as the distance between two random walk distributions starting from $s$ and $t$. To estimate these distributions, the required random walk length is large when the underlying graph can be easily cut into smaller pieces. Inspired by this observation, we present novel formulas of \bd to represent $b(s,t)$ by independent random walks within two node sets $\mathcal{V}_s$, $\mathcal{V}_t$ separated by a small \emph{cut set} $\mathcal{V}_{cut}$, where $\mathcal{V}_s\cup\mathcal{V}_t\cup\mathcal{V}_{cut}=\mathcal{V}$ is the set of graph nodes. Building upon this idea, we propose \bindex, a novel index structure which follows a divide-and-conquer strategy. The graph is first cut into pieces so that each part can be processed easily. Then, all the required random walk probabilities can be deterministically computed in a bottom-top manner. When a query comes, only a small part of the index needs to be accessed. We prove that \bindex\ requires $O(n\cdot h)$ space, can be built in $O(n\cdot h\cdot (h+d_{max}))$ time, and answers each query in $O(n\cdot h)$ time, where $h$ is the height of a hierarchy partition tree and $d_{max}$ is the maximum degree, which are both usually much smaller than $n$.

Country of Origin
🇨🇳 China

Page Count
15 pages

Category
Computer Science:
Data Structures and Algorithms