Self-stabilizing Uniform Reliable Broadcast
Paper i proceeding, 2021
We study a well-known communication abstraction called Uniform Reliable Broadcast (URB). URB is central in the design and implementation of fault-tolerant distributed systems, as many non-trivial fault-tolerant distributed applications require communication with provable guarantees on message deliveries. Our study focuses on fault-tolerant implementations for time-free message-passing systems that are prone to node-failures. Moreover, we aim at the design of an even more robust communication abstraction. We do so through the lenses of self-stabilization—a very strong notion of fault-tolerance. In addition to node and communication failures, self-stabilizing algorithms can recover after the occurrence of arbitrary transient faults; these faults represent any violation of the assumptions according to which the system was designed to operate (as long as the algorithm code stays intact). We propose the first self-stabilizing URB algorithm for asynchronous (time-free) message-passing systems that are prone to node-failures. The algorithm recovers within O(bufferUnitSize) (in terms of asynchronous cycles) from transient faults, where bufferUnitSize is a predefined constant. Also, the communication costs are similar to the ones of the non-self-stabilizing URB. The main differences are that our proposal considers repeated gossiping of O(1 ) bits messages and deals with bounded space (which is a prerequisite for self-stabilization). Moreover, each node stores up to bufferUnitSize· n records of size O(ν+ nlog n) bits, where n is the number of nodes and ν is the number of bits needed to encode a single URB instance.