Active Hypothesis Testing under Computational Budgets with Applications to GWAS and LLM
By: Qi Kuang, Bowen Gang, Yin Xia
Potential Business Impact:
Tests more ideas faster with less computer power.
In large-scale hypothesis testing, computing exact $p$-values or $e$-values is often resource-intensive, creating a need for budget-aware inferential methods. We propose a general framework for active hypothesis testing that leverages inexpensive auxiliary statistics to allocate a global computational budget. For each hypothesis, our data-adaptive procedure probabilistically decides whether to compute the exact test statistic or a transformed proxy, guaranteeing a valid $p$-value or $e$-value while satisfying the budget constraint in expectation. Theoretical guarantees are established for our constructions, showing that the procedure achieves optimality for $e$-values and for $p$-values under independence, and admissibility for $p$-values under general dependence. Empirical results from simulations and two real-world applications, including a large-scale genome-wide association study (GWAS) and a clinical prediction task leveraging large language models (LLM), demonstrate that our framework improves statistical efficiency under fixed resource limits.
Similar Papers
Active multiple testing with proxy p-values and e-values
Methodology
Tests ideas faster using smart guesses.
Active Nonparametric Two-Sample Testing by Betting on Heterogeneous Data Sources
Statistics Theory
Finds differences in data faster, even with messy sources.
An Efficient Framework for Robust Sample Size Determination
Methodology
Finds best study size for reliable results.