LEFT-RS: A Lock-Free Fault-Tolerant Resource Sharing Protocol for Multicore Real-Time Systems
By: Nan Chen , Xiaotian Dai , Tong Cheng and more
Emerging real-time applications have driven the transition to multicore embedded systems, where tasks must share resources due to functional demands and limited availability. These resources, whether local or global, are protected within critical sections to prevent race conditions, with locking protocols ensuring both exclusive access and timing requirements. However, transient faults occurring within critical sections can disrupt execution and propagate errors across multiple tasks. Conventional locking protocols fail to address such faults, and integrating traditional fault tolerance techniques often increases blocking. Recent approaches improve fault recovery through parallel replica execution; however, challenges remain due to sequential accessing, coordination overhead, and susceptibility to common-mode faults. In this paper, we propose a Lock-frEe Fault-Tolerant Resource Sharing (LEFT-RS) protocol for multicore real-time systems. LEFT-RS allows tasks to concurrently access and read global resources while entering their critical sections in parallel. Each task can complete its access earlier upon successful execution if other tasks experience faults, thereby improving the efficiency of resource usage. Our design also limits the overhead and enhances fault resilience. We present a comprehensive worst-case response time analysis to ensure timing guarantees. Extensive evaluation results demonstrate that our method significantly outperforms existing approaches, achieving up to an 84.5% improvement in schedulability on average.
Similar Papers
Fair Kernel-Lock-Free Claim/Release Protocol for Shared Object Access in Cooperatively Scheduled Runtimes
Distributed, Parallel, and Cluster Computing
Lets programs share things fairly and fast.
Recoverable Lock-Free Locks
Distributed, Parallel, and Cluster Computing
Makes computer programs safer and able to fix themselves.
FTI-TMR: A Fault Tolerance and Isolation Algorithm for Interconnected Multicore Systems
Distributed, Parallel, and Cluster Computing
Keeps computers working even when parts break.