Score: 0

Hard Shell, Reliable Core: Improving Resilience in Replicated Systems with Selective Hybridization

Published: August 13, 2025 | arXiv ID: 2508.10141v1

By: Laura Lawniczak, Tobias Distler

Potential Business Impact:

Makes computer systems safer by choosing what to protect.

Hybrid fault models are known to be an effective means for enhancing the robustness of consensus-based replicated systems. However, existing hybridization approaches suffer from limited flexibility with regard to the composition of crash-tolerant and Byzantine fault-tolerant system parts and/or are associated with a significant diversification overhead. In this paper we address these issues with ShellFT, a framework that leverages the concept of micro replication to allow system designers to freely choose the parts of the replication logic that need to be resilient against Byzantine faults. As a key benefit, such a selective hybridization makes it possible to develop hybrid solutions that are tailored to the specific characteristics and requirements of individual use cases. To illustrate this flexibility, we present three custom ShellFT protocols and analyze the complexity of their implementations. Our evaluation shows that compared with traditional hybridization approaches, ShellFT is able to decrease diversification costs by more than 70%.

FTHP-MPI: Towards Providing Replication-based Fault Tolerance in a Fault-Intolerant Native MPI Library

Distributed, Parallel, and Cluster Computing

Keeps supercomputers running when parts break.

14 Apr 2025 0

85%

FTI-TMR: A Fault Tolerance and Isolation Algorithm for Interconnected Multicore Systems

Distributed, Parallel, and Cluster Computing

Keeps computers working even when parts break.

19 Oct 2025 0

84%

FTI-TMR: A Fault Tolerance and Isolation Algorithm for Interconnected Multicore Systems

Distributed, Parallel, and Cluster Computing

Keeps computers working even when parts break.

19 Oct 2025 0

View PDF Login to Bookmark

Page Count

24 pages

Hard Shell, Reliable Core: Improving Resilience in Replicated Systems with Selective Hybridization

Makes computer systems safer by choosing what to protect.

Technical Abstract

FTHP-MPI: Towards Providing Replication-based Fault Tolerance in a Fault-Intolerant Native MPI Library

FTI-TMR: A Fault Tolerance and Isolation Algorithm for Interconnected Multicore Systems

FTI-TMR: A Fault Tolerance and Isolation Algorithm for Interconnected Multicore Systems