Dirichlet Meets Horvitz and Thompson: Estimating Homophily in Large Networks via Sampling
By: Hamed Ajorlou, Gonzalo Mateos, Luana Ruiz
Assessing homophily in large-scale networks is central to understanding structural regularities in graphs, and thus inform the choice of models (such as graph neural networks) adopted to learn from network data. Evaluation of smoothness metrics requires access to the entire network topology and node features, which may be impractical in several large-scale, dynamic, resource-limited, or privacy-constrained settings. In this work, we propose a sampling-based framework to estimate homophily via the Dirichlet energy (Laplacian-based total variation) of graph signals, leveraging the Horvitz-Thompson (HT) estimator for unbiased inference from partial graph observations. The Dirichlet energy is a so-termed total (of squared nodal feature deviations) over graph edges; hence, estimable under general network sampling designs for which edge-inclusion probabilities can be analytically derived and used as weights in the proposed HT estimator. We establish that the Dirichlet energy can be consistently estimated from sampled graphs, and empirically study other heterophily measures as well. Experiments on several heterophilic benchmark datasets demonstrate the effectiveness of the proposed HT estimators in reliably capturing homophilic structure (or lack thereof) from sampled network measurements.
Similar Papers
Measuring Over-smoothing beyond Dirichlet energy
Machine Learning (CS)
Finds when AI models get too confused.
Analysis of Dirichlet Energies as Over-smoothing Measures
Machine Learning (CS)
Makes computer learning models better at understanding connections.
Homophily in Complex Networks: Measures, Models, and Applications
Social and Information Networks
Shows how similar people group together online.