A Multi-Criteria Automated MLOps Pipeline for Cost-Effective Cloud-Based Classifier Retraining in Response to Data Distribution Shifts
By: Emmanuel K. Katalay, David O. Dimandja, Jordan F. Masakuna
Potential Business Impact:
Automates fixing computer brains when data changes.
The performance of machine learning (ML) models often deteriorates when the underlying data distribution changes over time, a phenomenon known as data distribution drift. When this happens, ML models need to be retrained and redeployed. ML Operations (MLOps) is often manual, i.e., humans trigger the process of model retraining and redeployment. In this work, we present an automated MLOps pipeline designed to address neural network classifier retraining in response to significant data distribution changes. Our MLOps pipeline employs multi-criteria statistical techniques to detect distribution shifts and triggers model updates only when necessary, ensuring computational efficiency and resource optimization. We demonstrate the effectiveness of our framework through experiments on several benchmark anomaly detection data sets, showing significant improvements in model accuracy and robustness compared to traditional retraining strategies. Our work provides a foundation for deploying more reliable and adaptive ML systems in dynamic real-world settings, where data distribution changes are common.
Similar Papers
MLOps Monitoring at Scale for Digital Platforms
Econometrics
Keeps computer predictions accurate without constant work.
An Empirical Evaluation of Modern MLOps Frameworks
Software Engineering
Helps pick the best AI tools for jobs.
DNN-Powered MLOps Pipeline Optimization for Large Language Models: A Framework for Automated Deployment and Resource Management
Distributed, Parallel, and Cluster Computing
Makes big computer brains work faster and cheaper.