Scalable and Provable Kemeny Constant Computation on Static and Dynamic Graphs: A 2-Forest Sampling Approach
By: Cheng Li , Meihao Liao , Rong-Hua Li and more
Potential Business Impact:
Finds important connections in networks faster.
Kemeny constant, defined as the expected hitting time of random walks from a source node to a randomly chosen target node, is a fundamental metric in graph data management with many real-world applications. However, computing it exactly on large graphs is highly challenging, as it requires inverting large graph matrices. Existing solutions mainly rely on approximate random-walk-based methods, which still need large sample sizes and lack strong theoretical guarantees. In this paper, we propose a new approach for approximating the Kemeny constant via 2-forest sampling. We first derive an unbiased estimator expressed through spanning trees by introducing a path mapping technique that establishes a direct correspondence between spanning trees and certain classes of 2-forests. Compared to random walk-based estimators, 2-forest-based estimators yield leads to a better theoretical bound. We further design efficient algorithms to sample and traverse spanning trees, leveraging data structures such as the Binary Indexed Tree (BIT) for optimization. Our theoretical analysis shows that the Kemeny constant can be approximated with relative error $ε$ in $O\left(\frac{Δ^2\bar{d}^2}{ε^2}(τ+ n\min(\log n, Δ))\right)$ time, where $τ$ is the tree-sampling time, $\bar{d}$ is the average degree, and $Δ$ is the graph diameter. This complexity is near-linear in practice. Moreover, existing methods largely target static graphs and lack efficient mechanisms for dynamic updates. To address this, we propose two sample maintenance strategies that partially update samples while preserving accuracy on dynamic graphs. Extensive experiments on 10 large real-world datasets demonstrate that our method consistently outperforms state-of-the-art approaches in both efficiency and accuracy on static and dynamic graphs.
Similar Papers
The Derivative of Kemeny's Constant as a Centrality Measure in Undirected Graphs
Numerical Analysis
Finds important roads in networks.
Using random spanning trees in survivable networks design
Discrete Mathematics
Creates strong computer networks with fewer wires.
Weighted $k$-Path and Other Problems in Almost $O^*(2^k)$ Deterministic Time via Dynamic Representative Sets
Data Structures and Algorithms
Finds shortest paths with many stops faster.