Offloading tracing for real-time systems using a scalable cloud infrastructure
By: David Jannis Schmidt, Grigory Fridman, Florian von Zabiensky
Potential Business Impact:
Lets computers watch many programs at once.
Real-time embedded systems require precise timing and fault detection to ensure correct behavior. Traditional tracing tools often rely on local desktops with limited processing and storage capabilities, which hampers large-scale analysis. This paper presents a scalable, cloud-based architecture for software tracing in real-time systems based on microservices and edge computing. Our approach shifts the trace processing workload from the developer's machine to the cloud, using a dedicated tracing component that captures trace data and forwards it to a scalable backend via WebSockets and Apache Kafka. This enables long-term monitoring and collaborative analysis of target executions, e.g., to detect and investigate sporadic errors. We demonstrate how this architecture supports scalable analysis of parallel tracing sessions and lays the foundation for future integration of rule-based testing and runtime verification. The evaluation results show that the architecture can handle many parallel tracing sessions efficiently, although the per-session throughput decreases slightly as the system load increases, while the overall throughput increases. Although the design includes a dedicated tracer for analysis during development, this approach is not limited to such setups. Target systems with network connectivity can stream reduced trace data directly, enabling runtime monitoring in the field.
Similar Papers
Tracing and Metrics Design Patterns for Monitoring Cloud-native Applications
Software Engineering
Helps fix computer problems faster.
CrossTrace: Efficient Cross-Thread and Cross-Service Span Correlation in Distributed Tracing for Microservices
Networking and Internet Architecture
Helps fix computer programs faster without changing code.
A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces
Distributed, Parallel, and Cluster Computing
Analyzes computer speed problems faster.