Score: 0

Tree-based methods for estimating heterogeneous model performance and model combining

Published: June 2, 2025 | arXiv ID: 2506.01905v1

By: Ruotao Zhang, Constantine Gatsonis, Jon Steingrimsson

Potential Business Impact:

Finds groups where a computer model works poorly.

Business Areas:
A/B Testing Data and Analytics

Model performance is frequently reported only for the overall population under consideration. However, due to heterogeneity, overall performance measures often do not accurately represent model performance within specific subgroups. We develop tree-based methods for the data-driven identification of subgroups with differential model performance, where splitting decisions are made to maximize heterogeneity in performance between subgroups. We extend these methods to tree ensembles, including both random forests and gradient boosting. Lastly, we illustrate how these ensembles can be used for model combination. We evaluate the methods through simulations and apply them to lung cancer screening data.

Page Count
52 pages

Category
Statistics:
Methodology