A Nonparametric Statistics Approach to Feature Selection in Deep Neural Networks with Theoretical Guarantees
By: Junye Du , Zhenghao Li , Zhutong Gu and more
This paper tackles the problem of feature selection in a highly challenging setting: $\mathbb{E}(y | \boldsymbol{x}) = G(\boldsymbol{x}_{\mathcal{S}_0})$, where $\mathcal{S}_0$ is the set of relevant features and $G$ is an unknown, potentially nonlinear function subject to mild smoothness conditions. Our approach begins with feature selection in deep neural networks, then generalizes the results to H{ö}lder smooth functions by exploiting the strong approximation capabilities of neural networks. Unlike conventional optimization-based deep learning methods, we reformulate neural networks as index models and estimate $\mathcal{S}_0$ using the second-order Stein's formula. This gradient-descent-free strategy guarantees feature selection consistency with a sample size requirement of $n = Ω(p^2)$, where $p$ is the feature dimension. To handle high-dimensional scenarios, we further introduce a screening-and-selection mechanism that achieves nonlinear selection consistency when $n = Ω(s \log p)$, with $s$ representing the sparsity level. Additionally, we refit a neural network on the selected features for prediction and establish performance guarantees under a relaxed sparsity assumption. Extensive simulations and real-data analyses demonstrate the strong performance of our method even in the presence of complex feature interactions.
Similar Papers
Provable FDR Control for Deep Feature Selection: Deep MLPs and Beyond
Machine Learning (Stat)
Helps AI pick the most important information.
Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit
Machine Learning (Stat)
Teaches computers to learn hidden patterns faster.
Statistically guided deep learning
Statistics Theory
Makes computer learning more accurate and faster.