Asymptotic Theory and Phase Transitions for Variable Importance in Quantile Regression Forests
By: Tomoshige Nakamura, Hiroshi Shiraishi
Potential Business Impact:
Fixes computer predictions that guess wrong.
Quantile Regression Forests (QRF) are widely used for non-parametric conditional quantile estimation, yet statistical inference for variable importance measures remains challenging due to the non-smoothness of the loss function and the complex bias-variance trade-off. In this paper, we develop a asymptotic theory for variable importance defined as the difference in pinball loss risks. We first establish the asymptotic normality of the QRF estimator by handling the non-differentiable pinball loss via Knight's identity. Second, we uncover a "phase transition" phenomenon governed by the subsampling rate $β$ (where $s \asymp n^β$). We prove that in the bias-dominated regime ($β\ge 1/2$), which corresponds to large subsample sizes typically favored in practice to maximize predictive accuracy, standard inference breaks down as the estimator converges to a deterministic bias constant rather than a zero-mean normal distribution. Finally, we derive the explicit analytic form of this asymptotic bias and discuss the theoretical feasibility of restoring valid inference via analytic bias correction. Our results highlight a fundamental trade-off between predictive performance and inferential validity, providing a theoretical foundation for understanding the intrinsic limitations of random forest inference in high-dimensional settings.
Similar Papers
Generalized random forest for extreme quantile regression
Methodology
Predicts rare weather events more accurately.
Asymptotic confidence bands for centered purely random forests
Statistics Theory
Makes computer predictions more accurate and reliable.
On the Effect of Regularization on Nonparametric Mean-Variance Regression
Machine Learning (Stat)
Makes AI better at guessing how sure it is.