Score: 0

Statistical Inference for Gradient Boosting Regression

Published: September 27, 2025 | arXiv ID: 2509.23127v1

By: Haimo Fang, Kevin Tan, Giles Hooker

Potential Business Impact:

Makes computer predictions more trustworthy and accurate.

Business Areas:

A/B Testing Data and Analytics

Gradient boosting is widely popular due to its flexibility and predictive accuracy. However, statistical inference and uncertainty quantification for gradient boosting remain challenging and under-explored. We propose a unified framework for statistical inference in gradient boosting regression. Our framework integrates dropout or parallel training with a recently proposed regularization procedure that allows for a central limit theorem (CLT) for boosting. With these enhancements, we surprisingly find that increasing the dropout rate and the number of trees grown in parallel at each iteration substantially enhances signal recovery and overall performance. Our resulting algorithms enjoy similar CLTs, which we use to construct built-in confidence intervals, prediction intervals, and rigorous hypothesis tests for assessing variable importance. Numerical experiments demonstrate that our algorithms perform well, interpolate between regularized boosting and random forests, and confirm the validity of their built-in statistical inference procedures.

Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data

Machine Learning (Stat)

Makes computer predictions better for groups of data.

31 Oct 2025 0

86%

Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection

Machine Learning (CS)

Finds fake insurance claims faster.

7 Oct 2025 1

86%

Dynamic Regularized CBDT: Variance-Calibrated Causal Boosting for Interpretable Heterogeneous Treatment Effects

Machine Learning (CS)

Finds best treatments for patients using smart computer rules.

18 Apr 2025 0

View PDF Login to Bookmark

Page Count

41 pages

Statistical Inference for Gradient Boosting Regression

Makes computer predictions more trustworthy and accurate.

Technical Abstract

Gradient Boosted Mixed Models: Flexible Joint Estimation of Mean and Variance Components for Clustered Data

Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection

Dynamic Regularized CBDT: Variance-Calibrated Causal Boosting for Interpretable Heterogeneous Treatment Effects