TAMO:Fine-Grained Root Cause Analysis via Tool-Assisted LLM Agent with Multi-Modality Observation Data in Cloud-Native Systems
By: Qi Wang , Xiao Zhang , Mingyi Li and more
Potential Business Impact:
Fixes computer problems automatically by understanding clues.
With the development of distributed systems, microservices and cloud native technologies have become central to modern enterprise software development. Despite bringing significant advantages, these technologies also increase system complexity and operational challenges. Traditional root cause analysis (RCA) struggles to achieve automated fault response, heavily relying on manual intervention. In recent years, large language models (LLMs) have made breakthroughs in contextual inference and domain knowledge integration, providing new solutions for Artificial Intelligence for Operations (AIOps). However, Existing LLM-based approaches face three key challenges: text input constraints, dynamic service dependency hallucinations, and context window limitations. To address these issues, we propose a tool-assisted LLM agent with multi-modality observation data, namely TAMO, for fine-grained RCA. It unifies multi-modal observational data into time-aligned representations to extract consistent features and employs specialized root cause localization and fault classification tools for perceiving the contextual environment. This approach overcomes the limitations of LLM in handling real-time changing service dependencies and raw observational data and guides LLM to generate repair strategies aligned with system contexts by structuring key information into a prompt. Experimental results show that TAMO performs well in root cause analysis when dealing with public datasets characterized by heterogeneity and common fault types, demonstrating its effectiveness.
Similar Papers
MicroRCA-Agent: Microservice Root Cause Analysis Method Based on Large Language Model Agents
Artificial Intelligence
Finds computer problems faster by reading logs.
Reasoning Language Models for Root Cause Analysis in 5G Wireless Networks
Artificial Intelligence
Fixes phone network problems faster using smart AI.
GALA: Can Graph-Augmented Large Language Model Agentic Workflows Elevate Root Cause Analysis?
Artificial Intelligence
Finds computer problems and tells you how to fix them.