DynaCausal: Dynamic Causality-Aware Root Cause Analysis for Distributed Microservices
By: Songhan Zhang , Aoyang Fang , Yifan Yang and more
Potential Business Impact:
Finds the real reason computer problems happen faster.
Cloud-native microservices enable rapid iteration and scalable deployment but also create complex, fast-evolving dependencies that challenge reliable diagnosis. Existing root cause analysis (RCA) approaches, even with multi-modal fusion of logs, traces, and metrics, remain limited in capturing dynamic behaviors and shifting service relationships. Three critical challenges persist: (i) inadequate modeling of cascading fault propagation, (ii) vulnerability to noise interference and concept drift in normal service behavior, and (iii) over-reliance on service deviation intensity that obscures true root causes. To address these challenges, we propose DynaCausal, a dynamic causality-aware framework for RCA in distributed microservice systems. DynaCausal unifies multi-modal dynamic signals to capture time-varying spatio-temporal dependencies through interaction-aware representation learning. It further introduces a dynamic contrastive mechanism to disentangle true fault indicators from contextual noise and adopts a causal-prioritized pairwise ranking objective to explicitly optimize causal attribution. Comprehensive evaluations on public benchmarks demonstrate that DynaCausal consistently surpasses state-of-the-art methods, attaining an average AC@1 of 0.63 with absolute gains from 0.25 to 0.46, and delivering both accurate and interpretable diagnoses in highly dynamic microservice environments.
Similar Papers
Research on fault diagnosis and root cause analysis based on full stack observability
Distributed, Parallel, and Cluster Computing
Finds computer problems faster and explains why.
MicroRCA-Agent: Microservice Root Cause Analysis Method Based on Large Language Model Agents
Artificial Intelligence
Finds computer problems faster by reading logs.
GALA: Can Graph-Augmented Large Language Model Agentic Workflows Elevate Root Cause Analysis?
Artificial Intelligence
Finds computer problems and tells you how to fix them.