A multivariate extension of Azadkia-Chatterjee's rank coefficient
By: Wenjie Huang, Zonghan Li, Yuhao Wang
The Azadkia-Chatterjee coefficient is a rank-based measure of dependence between a random variable $Y \in \mathbb{R}$ and a random vector ${\boldsymbol Z} \in \mathbb{R}^{d_Z}$. This paper proposes a multivariate extension that measures dependence between random vectors ${\boldsymbol Y} \in \mathbb{R}^{d_Y}$ and ${\boldsymbol Z} \in \mathbb{R}^{d_Z}$, based on $n$ i.i.d. samples. The proposed coefficient converges almost surely to a limit with the following properties: i) it lies in $[0, 1]$; ii) it equals zero if and only if ${\boldsymbol Y}$ and ${\boldsymbol Z}$ are independent; and iii) it equals one if and only if ${\boldsymbol Y}$ is almost surely a function of ${\boldsymbol Z}$. Remarkably, the only assumption required by this convergence is that ${\boldsymbol Y}$ is not almost surely a constant. We further prove that under the same mild condition, the coefficient is asymptotically normal when ${\boldsymbol Y}$ and ${\boldsymbol Z}$ are independent and propose a merge sort based algorithm to calculate this coefficient in time complexity $O(n (\log n)^{d_Y})$. Finally, we show that it can be used to measure conditional dependence between ${\boldsymbol Y}$ and ${\boldsymbol Z}$ conditional on a third random vector ${\boldsymbol X}$, and prove that the measure is monotonic with respect to the deviation from an independence distribution under certain model restrictions.
Similar Papers
Quantifying and testing dependence to categorical variables
Statistics Theory
Measures how two things are connected.
On continuity of Chatterjee's rank correlation and related dependence measures
Statistics Theory
Makes math tools work better for understanding data.
Quantifying and testing dependence to categorical variables
Statistics Theory
Measures how two things are connected.