High-dimensional Gaussian and bootstrap approximations for robust means
By: Anders Bredahl Kock, David Preinerstorfer
Potential Business Impact:
Makes data analysis better even with tricky numbers.
Recent years have witnessed much progress on Gaussian and bootstrap approximations to the distribution of max-type statistics of sums of independent random vectors with dimension $d$ large relative to the sample size $n$. However, for any number of moments $m>2$ that the summands may possess, there exist distributions such that these approximations break down if $d$ grows faster than $n^{\frac{m}{2}-1}$. In this paper, we establish Gaussian and bootstrap approximations to the distributions of winsorized and trimmed means that allow $d$ to grow at an exponential rate in $n$ as long as $m>2$ moments exist. The approximations remain valid under some amount of adversarial contamination. Our implementations of the winsorized and trimmed means are fully data-driven and do not depend on any unknown population quantities. As a consequence, the performance of the approximation guarantees ``adapts'' to $m$.
Similar Papers
Gaussian Multiplier Bootstrap Procedure for the $κ$th Largest Coordinate of High-Dimensional Statistics
Statistics Theory
Helps understand complex data when there's a lot of it.
Robust and Computationally Efficient Trimmed L-Moments Estimation for Parametric Distributions
Methodology
Finds patterns in messy data, ignoring bad numbers.
The Berry-Esseen Bound for High-dimensional Self-normalized Sums
Probability
Makes math work better for huge amounts of data.