Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emergency and Critical Care
By: Yusuf Brima, Marcellin Atemkeng
Potential Business Impact:
Helps doctors predict patient danger faster.
Emergency and intensive care environments require predictive models that are both accurate and computationally efficient, yet clinical data in these settings are often severely imbalanced. Such skewness undermines model reliability, particularly for rare but clinically crucial outcomes, making robustness and scalability essential for real-world usage. In this paper, we systematically evaluate the robustness and scalability of classical machine learning models on imbalanced tabular data from MIMIC-IV-ED and eICU. Class imbalance was quantified using complementary metrics, and we compared the performance of tree-based methods, the state-of-the-art TabNet deep learning model, and a custom lightweight residual network. TabResNet was designed as a computationally efficient alternative to TabNet, replacing its complex attention mechanisms with a streamlined residual architecture to maintain representational capacity for real-time clinical use. All models were optimized via a Bayesian hyperparameter search and assessed on predictive performance, robustness to increasing imbalance, and computational scalability. Our results, on seven clinically vital predictive tasks, show that tree-based methods, particularly XGBoost, consistently achieved the most stable performance across imbalance levels and scaled efficiently with sample size. Deep tabular models degraded more sharply under imbalance and incurred higher computational costs, while TabResNet provided a lighter alternative to TabNet but did not surpass ensemble benchmarks. These findings indicate that in emergency and critical care, robustness to imbalance and computational scalability could outweigh architectural complexity. Tree-based ensemble methods currently offer the most practical and clinically feasible choice, equipping practitioners with a framework for selecting models suited to high-stakes, time-sensitive environments.
Similar Papers
Tree Boosting Methods for Balanced andImbalanced Classification and their Robustness Over Time in Risk Assessment
Machine Learning (CS)
Helps computers find rare things in messy data.
How Ensemble Learning Balances Accuracy and Overfitting: A Bias-Variance Perspective on Tabular Data
Machine Learning (CS)
Makes computer predictions more accurate without mistakes.
Enhancing mortality prediction in cardiac arrest ICU patients through meta-modeling of structured clinical data from MIMIC-IV
Machine Learning (CS)
Helps doctors predict if sick patients will die.