A Two-Staged LLM-Based Framework for CI/CD Failure Detection and Remediation with Industrial Validation
By: Weiyuan Xu , Juntao Luo , Tao Huang and more
Potential Business Impact:
Fixes computer code errors automatically.
Continuous Integration and Continuous Deployment (CI/CD) pipelines are pivotal to modern software engineering, yet diagnosing and resolving their failures remains a complex and labor-intensive challenge. In this paper, we present LogSage, the first end-to-end LLM-powered framework that performs root cause analysis and solution generation from failed CI/CD pipeline logs. During the root cause analysis stage, LogSage employs a specialized log preprocessing pipeline tailored for LLMs, which extracts critical error logs and eliminates noise to enhance the precision of LLM-driven root cause analysis. In the solution generation stage, LogSage leverages RAG to integrate historical resolution strategies and utilizes tool-calling to deliver actionable, automated fixes. We evaluated the root cause analysis stage using a newly curated open-source dataset, achieving 98\% in precision and 12\% improvement over naively designed LLM-based log analysis baselines, while attaining near-perfect recall. The end-to-end system was rigorously validated in a large-scale industrial CI/CD environment of production quality, processing more than 3,000 executions daily and accumulating more than 1.07 million executions in its first year of deployment, with end-to-end precision exceeding 88\%. These two forms of evaluation confirm that LogSage providing a scalable and practical solution to manage CI/CD pipeline failures in real-world DevOps workflows.
Similar Papers
UniSage: A Unified and Post-Analysis-Aware Sampling for Microservices
Software Engineering
Finds computer problems faster by saving important data.
A Lightweight Framework for Trigger-Guided LoRA-Based Self-Adaptation in LLMs
Computation and Language
Lets AI learn new things while solving problems.
A layered architecture for log analysis in complex IT systems
Machine Learning (CS)
Finds computer problems automatically and fast.