Score: 0

A comparison of variable selection methods and predictive models for postoperative bowel surgery complications

Published: July 30, 2025 | arXiv ID: 2507.22771v1

By: Özge Şahin , Annemiek Kwast , Annemieke Witteveen and more

Potential Business Impact:

Helps doctors guess patient surgery risks better.

Business Areas:
A/B Testing Data and Analytics

Accurate prediction of postoperative complications can support personalized perioperative care. However, in surgical settings, data collection is often constrained, and identifying which variables to prioritize remains an open question. We analyzed 767 elective bowel surgeries performed under an Enhanced Recovery After Surgery protocol at Medisch Spectrum Twente (Netherlands) between March 2020 and December 2023. Although hundreds of variables were available, most had substantial missingness or near-constant values and were therefore excluded. After data preprocessing, 34 perioperative predictors were selected for further analysis. Surgeries from 2020 to 2022 ($n=580$) formed the development set, and 2023 cases ($n=187$) provided temporal validation. We modeled two binary endpoints: any and serious postoperative complications (Clavien Dindo $\ge$ IIIa). We compared weighted logistic regression, stratified random forests, and Naive Bayes under class imbalance (serious complication rate $\approx$11\%; any complication rate $\approx$35\%). Probabilistic performance was assessed using class-specific Brier scores. We advocate reporting probabilistic risk estimates to guide monitoring based on uncertainty. Random forests yielded better calibration across outcomes. Variable selection modestly improved weighted logistic regression and Naive Bayes but had minimal effect on random forests. Despite single-center data, our findings underscore the value of careful preprocessing and ensemble methods in perioperative risk modeling.

Page Count
31 pages

Category
Statistics:
Applications