Mosaic inference on panel data
By: Asher Spector, Rina Foygel Barber, Emmanuel Candès
Potential Business Impact:
Checks if data groups are truly separate.
Analysis of panel data via linear regression is widespread across disciplines. To perform statistical inference, such analyses typically assume that clusters of observations are jointly independent. For example, one might assume that observations in New York are independent of observations in New Jersey. Are such assumptions plausible? Might there be hidden dependencies between nearby clusters? This paper introduces a mosaic permutation test that can (i) test the cluster-independence assumption and (ii) produce confidence intervals for linear models without assuming the full cluster-independence assumption. The key idea behind our method is to apply a permutation test to carefully constructed residual estimates that obey the same invariances as the true errors. As a result, our method yields finite-sample valid inferences under a mild "local exchangeability" condition. This condition differs from the typical cluster-independence assumption, as neither assumption implies the other. Furthermore, our method is asymptotically valid under cluster-independence (with no exchangeability assumptions). Together, these results show our method is valid under assumptions that are arguably weaker than the assumptions underlying many classical methods. In experiments on well-studied datasets from the literature, we find that many existing methods produce variance estimates that are up to five times too small, whereas mosaic methods produce reliable results. We implement our methods in the python package mosaicperm.
Similar Papers
Cheap Permutation Testing
Statistics Theory
Tests data faster by grouping it.
Robust Inference Methods for Latent Group Panel Models under Possible Group Non-Separation
Econometrics
Finds hidden patterns in data to make better predictions.
A studentized permutation test for the treatment effect in individual participant data meta-analysis
Methodology
Makes combining study results more accurate.