Artificial Intelligence for Cost-Aware Resource Prediction in Big Data Pipelines
By: Harshit Goyal
Potential Business Impact:
Saves money by guessing computer needs.
Efficient resource allocation is a key challenge in modern cloud computing. Over-provisioning leads to unnecessary costs, while under-provisioning risks performance degradation and SLA violations. This work presents an artificial intelligence approach to predict resource utilization in big data pipelines using Random Forest regression. We preprocess the Google Borg cluster traces to clean, transform, and extract relevant features (CPU, memory, usage distributions). The model achieves high predictive accuracy (R Square = 0.99, MAE = 0.0048, RMSE = 0.137), capturing non-linear relationships between workload characteristics and resource utilization. Error analysis reveals impressive performance on small-to-medium jobs, with higher variance in rare large-scale jobs. These results demonstrate the potential of AI-driven prediction for cost-aware autoscaling in cloud environments, reducing unnecessary provisioning while safeguarding service quality.
Similar Papers
Intelligent Resource Allocation Optimization for Cloud Computing via Machine Learning
Distributed, Parallel, and Cluster Computing
Makes computer clouds work smarter and cheaper.
Machine Learning-Driven Predictive Resource Management in Complex Science Workflows
Distributed, Parallel, and Cluster Computing
Predicts computer needs for science experiments.
A Hybrid Proactive And Predictive Framework For Edge Cloud Resource Management
Artificial Intelligence
AI predicts problems, saving money and keeping apps fast.