Score: 3

Optimal Graph Clustering without Edge Density Signals

Published: October 24, 2025 | arXiv ID: 2510.21669v1

By: Maximilien Dreveton , Elaine Siyu Liu , Matthias Grossglauser and more

BigTech Affiliations: Stanford University

Potential Business Impact:

Finds hidden groups in messy data better.

Business Areas:
A/B Testing Data and Analytics

This paper establishes the theoretical limits of graph clustering under the Popularity-Adjusted Block Model (PABM), addressing limitations of existing models. In contrast to the Stochastic Block Model (SBM), which assumes uniform vertex degrees, and to the Degree-Corrected Block Model (DCBM), which applies uniform degree corrections across clusters, PABM introduces separate popularity parameters for intra- and inter-cluster connections. Our main contribution is the characterization of the optimal error rate for clustering under PABM, which provides novel insights on clustering hardness: we demonstrate that unlike SBM and DCBM, cluster recovery remains possible in PABM even when traditional edge-density signals vanish, provided intra- and inter-cluster popularity coefficients differ. This highlights a dimension of degree heterogeneity captured by PABM but overlooked by DCBM: local differences in connectivity patterns can enhance cluster separability independently of global edge densities. Finally, because PABM exhibits a richer structure, its expected adjacency matrix has rank between $k$ and $k^2$, where $k$ is the number of clusters. As a result, spectral embeddings based on the top $k$ eigenvectors may fail to capture important structural information. Our numerical experiments on both synthetic and real datasets confirm that spectral clustering algorithms incorporating $k^2$ eigenvectors outperform traditional spectral approaches.

Country of Origin
πŸ‡¨πŸ‡­ πŸ‡ΊπŸ‡Έ Switzerland, United States

Repos / Data Links

Page Count
36 pages

Category
Computer Science:
Machine Learning (CS)