Score: 2

Agentic Memory Enhanced Recursive Reasoning for Root Cause Localization in Microservices

Published: January 6, 2026 | arXiv ID: 2601.02732v1

By: Lingzhe Zhang , Tong Jia , Yunpeng Zhai and more

BigTech Affiliations: Alibaba

Potential Business Impact:

Finds computer problems faster by learning from past fixes.

Business Areas:

Semantic Search Internet Services

As contemporary microservice systems become increasingly popular and complex-often comprising hundreds or even thousands of fine-grained, interdependent subsystems-they are experiencing more frequent failures. Ensuring system reliability thus demands accurate root cause localization. While many traditional graph-based and deep learning approaches have been explored for this task, they often rely heavily on pre-defined schemas that struggle to adapt to evolving operational contexts. Consequently, a number of LLM-based methods have recently been proposed. However, these methods still face two major limitations: shallow, symptom-centric reasoning that undermines accuracy, and a lack of cross-alert reuse that leads to redundant reasoning and high latency. In this paper, we conduct a comprehensive study of how Site Reliability Engineers (SREs) localize the root causes of failures, drawing insights from professionals across multiple organizations. Our investigation reveals that expert root cause analysis exhibits three key characteristics: recursiveness, multi-dimensional expansion, and cross-modal reasoning. Motivated by these findings, we introduce AMER-RCL, an agentic memory enhanced recursive reasoning framework for root cause localization in microservices. AMER-RCL employs the Recursive Reasoning RCL engine, a multi-agent framework that performs recursive reasoning on each alert to progressively refine candidate causes, while Agentic Memory incrementally accumulates and reuses reasoning from prior alerts within a time window to reduce redundant exploration and lower inference latency. Experimental results demonstrate that AMER-RCL consistently outperforms state-of-the-art methods in both localization accuracy and inference efficiency.

Adaptive Root Cause Localization for Microservice Systems with Multi-Agent Recursion-of-Thought

Software Engineering

Finds computer problems faster by thinking like people.

28 Aug 2025 2

89%

The Multi-Agent Fault Localization System Based on Monte Carlo Tree Search Approach

Software Engineering

Finds computer problems faster and more accurately.

30 Jul 2025 1

88%

Root Cause Analysis for Microservice Systems via Cascaded Conditional Learning with Hypergraphs

Machine Learning (CS)

Finds computer problems faster by seeing how they spread.

14 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

12 pages

Agentic Memory Enhanced Recursive Reasoning for Root Cause Localization in Microservices

Finds computer problems faster by learning from past fixes.

Technical Abstract

Adaptive Root Cause Localization for Microservice Systems with Multi-Agent Recursion-of-Thought

The Multi-Agent Fault Localization System Based on Monte Carlo Tree Search Approach

Root Cause Analysis for Microservice Systems via Cascaded Conditional Learning with Hypergraphs