GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search
By: Heng Zhang , Yuling Shi , Xiaodong Gu and more
Potential Business Impact:
Finds why AI teams fail and fixes them.
Multi-agent systems powered by Large Language Models excel at complex tasks through coordinated collaboration, yet they face high failure rates in multi-turn deep search scenarios. Existing temporal attribution methods struggle to accurately diagnose root causes, particularly when errors propagate across multiple agents. Attempts to automate failure attribution by analyzing action sequences remain ineffective due to their inability to account for information dependencies that span agents. This paper identifies two core challenges: \textit{(i) distinguishing symptoms from root causes in multi-agent error propagation}, and \textit{(ii) tracing information dependencies beyond temporal order}. To address these issues, we introduce \textbf{GraphTracer}, a framework that redefines failure attribution through information flow analysis. GraphTracer constructs Information Dependency Graphs (IDGs) to explicitly capture how agents reference and build on prior outputs. It localizes root causes by tracing through these dependency structures instead of relying on temporal sequences. GraphTracer also uses graph-aware synthetic data generation to target critical nodes, creating realistic failure scenarios. Evaluations on the Who\&When benchmark and integration into production systems demonstrate that GraphTracer-8B achieves up to 18.18\% higher attribution accuracy compared to state-of-the-art models and enables 4.8\% to 14.2\% performance improvements in deployed multi-agent frameworks, establishing a robust solution for multi-agent system debugging.
Similar Papers
AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?
Computation and Language
Fixes AI mistakes in complex robot teams.
AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?
Computation and Language
Finds why AI "brains" make mistakes.
Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Artificial Intelligence
Finds why robot teams fail, makes them better.