Testing Correlation in Graphs by Counting Bounded Degree Motifs
By: Dong Huang, Pengkun Yang
Potential Business Impact:
Finds hidden connections between two sets of data.
Correlation analysis is a fundamental step for extracting meaningful insights from complex datasets. In this paper, we investigate the problem of detecting correlation between two Erd\H{o}s-R\'enyi graphs $G(n,p)$, formulated as a hypothesis testing problem: under the null hypothesis, the two graphs are independent, while under the alternative hypothesis, they are correlated. We develop a polynomial-time test by counting bounded degree motifs and prove its effectiveness for any constant correlation coefficient $\rho$ when the edge connecting probability satisfies $p\ge n^{-2/3}$. Our results overcome the limitation requiring $\rho \ge \sqrt{\alpha}$, where $\alpha\approx 0.338$ is the Otter's constant, extending it to any constant $\rho$. Methodologically, bounded degree motifs -- ubiquitous in real networks -- make the proposed statistic both natural and scalable. We also validate our method on synthetic and real co-citation networks, further confirming that this simple motif family effectively captures correlation signals and exhibits strong empirical performance.
Similar Papers
Detecting Correlation between Multiple Unlabeled Gaussian Networks
Statistics Theory
Finds hidden patterns in connected data.
Detectability Thresholds for Network Attacks on Static Graphs and Temporal Networks: Information-Theoretic Limits and Nearly-Optimal Tests
Information Theory
Finds hidden attacks in computer networks.
Approximately Counting and Sampling Hamiltonian Motifs in Sublinear Time
Data Structures and Algorithms
Find patterns in big networks faster.