Feature Selection and Junta Testing are Statistically Equivalent
By: Lorenzo Beretta, Nathaniel Harms, Caleb Koch
Potential Business Impact:
Finds important clues in data faster.
For a function $f \colon \{0,1\}^n \to \{0,1\}$, the junta testing problem asks whether $f$ depends on only $k$ variables. If $f$ depends on only $k$ variables, the feature selection problem asks to find those variables. We prove that these two tasks are statistically equivalent. Specifically, we show that the ``brute-force'' algorithm, which checks for any set of $k$ variables consistent with the sample, is simultaneously sample-optimal for both problems, and the optimal sample size is \[ \Theta\left(\frac 1 \varepsilon \left( \sqrt{2^k \log {n \choose k}} + \log {n \choose k}\right)\right). \]
Similar Papers
New Statistical and Computational Results for Learning Junta Distributions
Machine Learning (CS)
Finds hidden patterns in data using few clues.
Testing Juntas and Junta Subclasses with Relative Error
Computational Complexity
Finds computer programs that are almost right.
Optimality and computational barriers in variable selection under dependence
Statistics Theory
Finds the best way to pick important data.