CrossTrace: Efficient Cross-Thread and Cross-Service Span Correlation in Distributed Tracing for Microservices
By: Linh-An Phan , MingXue Wang , Guangyu Wu and more
Potential Business Impact:
Helps fix computer programs faster without changing code.
Distributed tracing has become an essential technique for debugging and troubleshooting modern microservice-based applications, enabling software engineers to detect performance bottlenecks, identify failures, and gain insights into system behavior. However, implementing distributed tracing in large-scale applications remains challenging due to the need for extensive instrumentation. To reduce this burden, zero-code instrumentation solutions, such as those based on eBPF, have emerged, allowing span data to be collected without modifying application code. Despite this promise, span correlation, the process of establishing causal relationships between spans, remains a critical challenge in zero-code approaches. Existing solutions often rely on thread affinity, compromise system security by requiring the kernel integrity mode to be disabled, or incur significant computational overhead due to complex inference algorithms. This paper presents CrossTrace, a practical and efficient distributed tracing solution designed to support the debugging of microservice applications without requiring source code modifications. CrossTrace employs a greedy algorithm to infer intra-service span relationships from delay patterns, eliminating reliance on thread identifiers. For inter-service correlation, CrossTrace embeds span identifiers into TCP packet headers via eBPF, enabling secure and efficient correlation compromising system security policies. Evaluation results show that CrossTrace can correlate thousands of spans within seconds with over 90% accuracy, making it suitable for production deployment and valuable for microservice observability and diagnosis.
Similar Papers
Trace Sampling 2.0: Code Knowledge Enhanced Span-level Sampling for Distributed Tracing
Software Engineering
Keeps all computer logs, saves lots of space.
Trace-based, time-resolved analysis of MPI application performance using standard metrics
Distributed, Parallel, and Cluster Computing
Finds hidden computer speed problems in programs.
SecTracer: A Framework for Uncovering the Root Causes of Network Intrusions via Security Provenance
Cryptography and Security
Finds hidden computer attacks by tracing their steps.