Modeling Anomaly Detection in Cloud Services: Analysis of the Properties that Impact Latency and Resource Consumption
By: Gabriel Job Antunes Grabher, Fumio Machida, Thomas Ropars
Potential Business Impact:
Finds best way to fix slow computer services.
Detecting and resolving performance anomalies in Cloud services is crucial for maintaining desired performance objectives. Scaling actions triggered by an anomaly detector help achieve target latency at the cost of extra resource consumption. However, performance anomaly detectors make mistakes. This paper studies which characteristics of performance anomaly detection are important to optimize the trade-off between performance and cost. Using Stochastic Reward Nets, we model a Cloud service monitored by a performance anomaly detector. Using our model, we study the impact of detector characteristics, namely precision, recall and inspection frequency, on the average latency and resource consumption of the monitored service. Our results show that achieving a high precision and a high recall is not always necessary. If detection can be run frequently, a high precision is enough to obtain a good performance-to-cost trade-off, but if the detector is run infrequently, recall becomes the most important.
Similar Papers
Artificial Intelligence-Based Multiscale Temporal Modeling for Anomaly Detection in Cloud Services
Machine Learning (CS)
Finds computer problems before they happen.
Artificial Intelligence-Based Multiscale Temporal Modeling for Anomaly Detection in Cloud Services
Machine Learning (CS)
Finds computer problems before they cause trouble.
Contrastive Learning-Based Dependency Modeling for Anomaly Detection in Cloud Services
Machine Learning (CS)
Finds computer problems before they happen.