Monitoring and Observability of Machine Learning Systems: Current Practices and Gaps
By: Joran Leest , Ilias Gerostathopoulos , Patricia Lago and more
Potential Business Impact:
Helps computers make right choices, not wrong ones.
Production machine learning (ML) systems fail silently -- not with crashes, but through wrong decisions. While observability is recognized as critical for ML operations, there is a lack empirical evidence of what practitioners actually capture. This study presents empirical results on ML observability in practice through seven focus group sessions in several domains. We catalog the information practitioners systematically capture across ML systems and their environment and map how they use it to validate models, detect and diagnose faults, and explain observed degradations. Finally, we identify gaps in current practice and outline implications for tooling design and research to establish ML observability practices.
Similar Papers
Monitoring Machine Learning Systems: A Multivocal Literature Review
Software Engineering
Keeps computer smarts working right all the time.
Sustainability of Machine Learning-Enabled Systems: The Machine Learning Practitioner's Perspective
Software Engineering
Helps build computer programs that are good for everyone.
Logging Requirement for Continuous Auditing of Responsible Machine Learning-based Applications
Software Engineering
Makes AI systems trustworthy and fair for everyone.