Evaluating Variance Estimates with Relative Efficiency
By: Kedar Karhadkar , Jack Klys , Daniel Ting and more
Potential Business Impact:
Checks if online tests are fair and trustworthy.
Experimentation platforms in industry must often deal with customer trust issues. Platforms must prove the validity of their claims as well as catch issues that arise. As a central quantity estimated by experimentation platforms, the validity of confidence intervals is of particular concern. To ensure confidence intervals are reliable, we must understand and diagnose when our variance estimates are biased or noisy, or when the confidence intervals may be incorrect. A common method for this is A/A testing, in which both the control and test arms receive the same treatment. One can then test if the empirical false positive rate (FPR) deviates substantially from the target FPR over many tests. However, this approach turns each A/A test into a simple binary random variable. It is an inefficient estimate of the FPR as it throws away information about the magnitude of each experiment result. We show how to empirically evaluate the effectiveness of statistics that monitor the variance estimates that partly dictate a platform's statistical reliability. We also show that statistics other than empirical FPR are more effective at detecting issues. In particular, we propose a $t^2$-statistic that is more sample efficient.
Similar Papers
$t$-Testing the Waters: Empirically Validating Assumptions for Reliable A/B-Testing
Methodology
Checks if online tests give true results.
Beyond Normality: Reliable A/B Testing with Non-Gaussian Data
Machine Learning (Stat)
Fixes online tests to make better choices.
A new approach to reliability assessment based on Exploratory factor analysis
Methodology
Makes science tests more trustworthy and accurate.