Functional Reasoning for Distributed Systems with Failures
By: Haobin Ni, Robbert van Renesse, Greg Morrisett
Potential Business Impact:
Makes computer systems more trustworthy and reliable.
Distributed system theory literature often argues for correctness using an informal, Hoare-like style of reasoning. While these arguments are intuitive, they have not all been foolproof, and whether they directly correspond to formal proofs is in question. We formally ground this kind of reasoning and connect it to standard formal approaches through language design and meta-analysis, which leads to a functional style of compositional formal reasoning for a class of distributed systems, including cases involving Byzantine faults. The core of our approach is twin languages: Sync and Async, which formalize the insight from distributed system theory that an asynchronous system can be reduced to a synchronous system for more straightforward reasoning under certain conditions. Sync describes a distributed system as a single, synchronous, data-parallel program. It restricts programs syntactically and has a functional denotational semantics suitable for Hoare-style formal reasoning. Async models a distributed system as a collection of interacting monadic programs, one for each non-faulty node in the system. It has a standard trace-based operational semantics, modeling asynchrony with interleaving. Sync compiles to Async and can then be extracted to yield executable code. We prove that any safety property proven for a Sync program in its denotational semantics is preserved in the operational semantics of its compiled Async programs. We implement the twin languages in Rocq and verify the safety properties of two fault-tolerant consensus protocols: BOSCO and SeqPaxos.
Similar Papers
Timetide: A programming model for logically synchronous distributed systems
Programming Languages
Lets computers work together without perfect timing.
Mechanized Metatheory of Forward Reasoning for End-to-End Linearizability Proofs
Programming Languages
Proves computer programs work correctly together.
Towards the Coordination and Verification of Heterogeneous Systems with Data and Time
Software Engineering
Checks if complex systems work together correctly.