Self-Stabilizing Indulgent Zero-degrading Binary Consensus

Guerraoui proposed an indulgent solution for the binary consensus problem. Namely, he showed that an arbitrary behavior of the failure detector never violates safety requirements even if it compromises liveness. Consensus implementations are often used in a repeated manner. Dutta and Guerraoui proposed a zero-degrading solution, i.e., during system runs in which the failure detector behaves perfectly, a node failure during one consensus instance has no impact on the performance of future instances. Our study, which focuses on indulgent zero-degrading binary consensus, aims at the design of an even more robust communication abstraction. We do so through the lenses of self-stabilization—a very strong notion of fault-tolerance. In addition to node and communication failures, self-stabilizing algorithms can recover after the occurrence of arbitrary transient faults; these faults represent any violation of the assumptions according to which the system was designed to operate (as long as the algorithm code stays intact). This work proposes the first, to the best of our knowledge, self-stabilizing algorithm for indulgent zero-degrading binary consensus for time-free message-passing systems prone to detectable process failures. The proposed algorithm has an stabilization time (in terms of asynchronous cycles) from arbitrary transient faults. Since the proposed solution uses an Ω failure detector, we also present the first, to the best of our knowledge, self-stabilizing asynchronous Ω failure detector, which is a variation on the one by Mostéfaoui, Mourgaya, and Raynal.


Introduction
We propose a self-stabilizing implementation of binary consensus objects for time-free (aka asynchronous) message-passing systems whose nodes may fail-stop.We also show a self-stabilizing asynchronous construction of eventual leader failure detector, Ω.

Background and motivation
With the information revolution, everything became connected, e.g., banking services, online reservations, e-commerce, IoTs, automated driving systems, to name a few.All of these applications are distributed, use message-passing systems, and require fault-tolerant implementations.Designing and verifying these systems is notoriously difficult since the system designers have to cope with their asynchronous nature and the presence of failures.The combined presence of failures and asynchrony creates uncertainties (from the perspective of individual processes) with respect to the application state.Indeed, Fischer, Lynch, and Paterson [21] showed that, in the presence of at least one (undetectable) process crash, there is no deterministic algorithm for determining the state of an asynchronous message-passing system in a way that can be validly agreed on by all non-faulty processes.
This work is motivated by applications whose state is replicated over several processes in a way that emulates a finite-state machine.In order to maintain consistent replicas, each process has to apply the same sequence of state-transitions according to different sources of (user) input.To this end, one can divide the problem into two: (i) propagate the user input to all replicas, and (ii) let each replica perform the same sequence of state-transitions.The former challenge can be rather simply addressed via uniform reliable broadcast [37,27], whereas the latter one is often considered to be at the problem core since it requires all processes to agree on a common value, i.e., the order in which all replicas apply their state transitions.In other words, the input must be totally ordered before delivering it to the emulated automaton.
It was observed that the agreement problem of item (ii) can be generalized.Namely, the consensus problem requires each process to propose a value, and all non-faulty processes to agree on a single decision, which must be one of the proposed values.The problem of fault-tolerant consensus was studied extensively in the context of time-free message passing-systems.The goal of our work is to broaden the set of failures that such solutions can tolerate.

Problem definition and scope
Definition 1.1 states the consensus problem.When the set, V , of values that can be proposed, includes just two values, the problem is called binary consensus.Otherwise, it is called multivalued consensus.Existing solutions for multivalued consensus often use binary consensus algorithms.Figure 1 depicts the relation to other problems in the area, which were mentioned earlier.Let Alg be an algorithm that solves consensus.Alg has to satisfy safety ( i.e., validity, integrity, and agreement) and liveness ( i.e., termination).
• Validity.Suppose that v is decided.Then, propose(v) was invoked by some process.
• Integrity.Suppose a process decides.It does so at most once.
• Agreement.No two processes decide different values.
As mentioned earlier, consensus cannot be solved in asynchronous message-passing systems that are prone to failures, as weak as even the crash of a single process [21].Unreliable failure detectors [10] are often used to circumvent such impossibilities.For a given failure detector class, Guerraoui [23] proposed an indulgent solution, namely, he showed that an arbitrary behavior of the failure detector never violates safety requirements even if it compromises liveness.Consensus implementations are often used in a repeated manner.Dutta and Guerraoui [20] proposed a zero-degrading solution, i.e., during system runs in which the failure detector behaves perfectly, a failure during one consensus instance has no impact on the performance of future instances.We study solutions for indulgent zero-degrading binary consensus.

Fault Model
We study a time-free message-passing system that has no guarantees on the communication delay and the algorithm cannot explicitly access the local clock.Our fault model includes (i) detectable fail-stop failures of processes, and (ii) communication failures, such as packet omission, duplication, and reordering.
In addition to the failures captured in our model, we also aim to recover from arbitrary transient faults, i.e., any temporary violation of assumptions according to which the system and network were designed to operate, e.g., the corruption of control variables, such as the program counter, packet payload, and indices, e.g., sequence numbers, which are responsible for the correct operation of the studied system, as well as operational assumptions, such as that at least a majority of nodes never fail.Since the occurrence of these failures can be arbitrarily combined, we assume that these transient faults can alter the system state in unpredictable ways.In particular, when modeling the system, we assume that these violations bring the system to an arbitrary state from which a self-stabilizing algorithm should recover the system.

Related Work
The celebrated Paxos algorithm [32] circumvents the impossibility by Fischer, Lynch, and Paterson [21] by assuming that failed computers can be detected by unreliable failure detectors [10].These detectors can eventually notify the algorithm about the set of computers that were recently up and connected.However, there is no bound on the time that it takes the algorithm to receive a correct version of this notification.It is worth mentioning that Paxos has inspired many veins of research, e.g., [39, and references therein].We, however, follow the family of abstractions by Raynal [37] due to its clear presentation that is easy to grasp as well as the fact that it can facilitate efficient implementations.Non-self-stabilizing solutions.
The Ω class includes eventual leader failure detectors.Chandra, Hadzilacos, and Toueg [9] defined this class and showed that it is the weakest for solving consensus in asynchronous message-passing systems while assuming that at most a minority of the nodes may fail.In this work we study the Ω failure detector by Mostéfaoui, Mourgaya, and Raynal [36].We note the existence of a computationally equivalent Ω failure detector by Aguilera et al. [1], which explicitly accesses timers.Our study focuses on [36] since it is asynchronous.
Guerraoui [23] presented the design criterion of indulgence.Guerraoui and Lynch [24] studied this criterion formally.Raynal [25,26] generalized it and designed indulgent Ω-based consensus algorithms.Dutta and Guerraoui [20] introduced the zero-degradation criterion.The studied algorithm is by Guerraoui and Raynal [25] who presented an indulgent zero-degrading consensus algorithm for message-passing systems in which the majority of the nodes never fail, and Ω-failure detectors are available.We have selected this algorithm due to its clear presentation and the fact that it matches the "two rounds" lower bound by Keidar and Rajsbaum [31].Hurfin et al. [28] showed that zero-degradation can be combined with the versatile use of a family of failure detector for improving the efficiency of round-based consensus algorithms.Wu et al. [40] presented the notion of round-zero-degradation, which extend zero-degradation, and the notation of look-head.They presented algorithms that extend the ones by Hurfin et al. and can reduce the number of required rounds.We note that such extensions are also plausible for our solutions.Self-stabilizing solutions.
We follow the design criteria of self-stabilization, which Dijkstra [12] proposed.A detailed pretension of self-stabilization was provided by Dolev [13] and Altisen et al. [3].
Blanchard et al. [8] have a self-stabilizing failure detector for partially synchronous systems.They mention the class P of perfect failure detectors.Indeed, there is a self-stabilizing asynchronous failure detector for class P by Beauquier and Kekkonen-Moneta [4] and a self-stabilizing synchronous Ω failure detector by Delporte-Gallet, Devismes, and Fauconnier [11].We present the first, to the best of our knowledge, asynchronous Ω failure detector.Hutle and Widder [29] present an impossibility result that connects fault detection, self-stabilization, and time-freedom as well as link capacity and local memory bounds.They explain how randomization can circumvent this impossibility for eventually perfect failure detector [30].Biely et al. [6] connect between classes of deterministic failure detectors, self-stabilization, and synchrony assumptions.We follow the assumption made by Mostéfaoui, Mourgaya, and Raynal [36] regarding communication patterns, which is another way to circumvent such impossibilities.
The consensus problem was not extensively studied in the context of self-stabilization.The notable exceptions are by Dolev et al. [15] and Blanchard et al. [8], which presented the first practically-self-stabilizing solutions for share-memory and message-passing systems, respectively.We note that practically-self-stabilizing systems, as defined by Alon et al. [2] and clarified by Salem and Schiller [38], do not satisfy Dijkstra's requirements, i.e., practically-self-stabilizing systems do not guarantee recovery within a finite time after the occurrence of transient faults.Moreover, the message size of Blanchard et al. is polynomial in the number of processes, whereas ours is a constant (that depends on the number of bits it takes to represent a process identifier).The origin of the design criteria of practically-self-stabilizing systems can be traced back to Dolev et al. [15], who provided a practically-self-stabilizing solution for the consensus problem in shared memory systems, whereas we study message-passing systems.It is worth mentioning that the work of Blanchard et al. has lead to the work of Dolev et al. [14], which considers a practically-self-stabilizing emulation of state-machine replication, which has the same task of the state-machine replication in Figure 1.However, Dolev et al.'s solution is based on virtual synchrony by Birman and Joseph [7], where the one in Figure 1 consider censuses.We also note that earlier self-stabilizing algorithms for state-machine replications were based on group communication systems and assumed execution fairness [17,18,19].
There are other self-stabilizing algorithms that are the result of transformations of non-self-stabilizing yet solutions, such as for atomic snapshots [22], uniform reliable broadcast [33], set-constraint delivery broadcast [34] and coded atomic storage [16].

Our contribution
We present a fundamental module for dependable distributed systems: a self-stabilizing algorithm for indulgent zero-degrading binary consensus for time-free message-passing systems that are prone to detectable node fail-stop failures.
The design criteria of indulgence and zero-degradation are essential for facilitating efficient distributed replication systems and self-stabilization is imperative for significantly advancing the fault-tolerance degree of future replication systems.Indulgence means that the safety properties, e.g., agreement, are never compromised even if the underlying model assumptions are never satisfied.Zero-degrading means that the process failures that occurred before the algorithm starts have no impact on its efficiency, which depends only on the failure pattern that occur during the system run.To the best of our knowledge, we are the first to provide a solution for binary consensus that is indulgent, zero-degrading and can tolerate a fault model as broad as ours.Our model includes detectable fail-stop failures, communication failures, such as packet omission, duplication, and reordering as well as arbitrary transient faults.The latter can model any temporary violation of the assumptions according to which the system was designed to operate (as long as the algorithm code stays intact).
In the absence of transient faults, our solution achieves consensus within an optimal number of communication rounds (without assuming fair execution).After the occurrence of any finite number of arbitrary transient faults, the system recovers within an asymptotically optimal time (while assuming fair execution).Namely, the stabilization time is in O(1) (in terms of asynchronous cycles).As in Guerraoui and Raynal [25], each node uses a bounded amount of memory.Moreover, the communication costs of our algorithm are similar to the non-self-stabilizing one by Guerraoui and Raynal [25].The main difference is in the period after a node has decided.Then, it has to broadcast the decided value.At that time, the non-self-stabilizing solution in [25] terminates whereas our self-stabilizing solution repeats the broadcast until the consensus object is deactivated by the invoking algorithm.This is along the lines of a well-known impossibility [13,Chapter 2.3] stating that self-stabilizing systems cannot terminate.Also, it is easy to trade the broadcast repetition rate with the speed of recovery from transient faults.
We also propose the first, to the best of our knowledge, self-stabilizing asynchronous Ω failure detector, which is a variation on Mostéfaoui, Mourgaya, and Raynal [36].We show transient fault recovery within the time it takes all non-crashed processes to exchange messages among themselves.The use of local memory and communication costs are asymptotically the same as the one of [36].The key difference is that we deal with the "counting to infinity" scenario, which transient fault can introduce.The proposed selfstabilizing solution uses a trade-off parameter, δ, that can balance between the solution's vulnerability (to elect a crashed node as a leader even in the absence of transient faults) and the time it takes to elect a non-faulty leader (after the occurrence of the last transient fault).Note that δ ∈ Z + can be a predefined constant.
As an extension, we also discuss how to transform the (non-self-stabilizing) randomized algorithm for binary consensus by Ben-Or [5] to a self-stabilizing one (Section 7).

Organization
We state our system settings in Section 2. Section 3 present our self-stabilizing asynchronous Ω failure detector.Section 4 includes a brief overview of the earlier algorithm that has led to the proposed solution.Our self-stabilizing algorithm is proposed in Section 5; it considers unbounded counters.The correctness proof appears in Section 6.We sketch an extension and conclude in Section 7.

System settings
We consider a time-free message-passing system that has no guarantees on the communication delay.Moreover, there is no notion of global (or universal) clocks and the algorithm cannot explicitly access the local clock (or timeout mechanisms).The system consists of a set, P, of n fail-prone nodes (or processors) with unique identifiers.Any pair of nodes p i , p j ∈ P have access to a bidirectional communication channel, channel j,i , that, at any time, has at most channelCapacity ∈ N packets on transit from p j to p i (this assumption is due to a well-known impossibility [13,Chapter 3.2]).
In the interleaving model [13], the node's program is a sequence of (atomic) steps.Each step starts with an internal computation and finishes with a single communication operation, i.e., a message send or receive.The state, s i , of node p i ∈ P includes all of p i 's variables and channel j,i .The term system state (or configuration) refers to the tuple

Task specification
The set of legal executions (LE) refers to all the executions in which the requirements of the task T hold.In this work, T binCon denotes the task of binary consensus, which Definition 1.1 specifies, and LE binCon denotes the set of executions in which the system fulfills T binCon 's requirements.Definition 1.1 considers the propose(s, k, v) operation.We refine the definition of propose(s, k, v) to include the value of s and k that we describe next.Moreover, we specify how the decided value is retrieved.We clarify that it can be either via the returned value of the propose() operation (as in the studied algorithm [25]) or via the returned value of the result(s, k) operation (as in the proposed solution).The proposed solution is tailored for the protocol suite presented in Figure 1.Thus, we consider multivalued consensus objects that use an array, BC[], of n binary consensus objects, such as the one by [37,Chapter 17], where n = |P| is the number of nodes in the system.Moreover, we organize these multivalued consensus objects in an array, CS[], of M elements, where M ∈ Z + is a predefined constant.We note that in case the algorithm that uses CS[] runs out of consensus objects, a global restart procedure can be invoked, such as the one in [22], Section 5. Thus, it is possible to have bounded sequence numbers for multivalued objects.

The fault model and self-stabilization
A failure occurrence is a step that the environment takes rather than the algorithm.

Benign failures.
When the occurrence of a failure cannot cause the system execution to lose legality, i.e., to leave LE, we refer to that failure as a benign one.The studied consensus algorithms are prone to fail-stop failures, in which nodes stop taking steps.We assume that at most t < |P |/2 node may fail and that unreliable failure detectors [10] can detect these failures.The studied failure detector constructions consider (undetectable) crash failures.We consider solutions that are oriented towards time-free message-passing systems and thus they are oblivious to the time in which the packets arrive and depart.We assume that any message can reside in a communication channel only for a finite period.Also, the communication channels are prone to packet failures, such as omission, duplication, reordering.However, if p i sends a message infinitely often to p j , node p j receives that message infinitely often.We refer to the latter as the fair communication assumption.Arbitrary transient faults.
We consider any temporary violation of the assumptions according to which the system was designed to operate.We refer to these violations and deviations as arbitrary transient faults and assume that they can corrupt the system state arbitrarily (while keeping the program code intact).The occurrence of an arbitrary transient fault is rare.Thus, our model assumes that the last arbitrary transient fault occurs before the system execution starts [13].Also, it leaves the system to start in an arbitrary state.Dijkstra's self-stabilization criterion.
An algorithm is self-stabilizing with respect to the task of LE, when every (unbounded) execution R of the algorithm reaches within a finite period a suffix R legal ∈ LE that is legal.Namely, Dijkstra [12] requires The complexity measure of self-stabilizing systems, called stabilization time, is the time it takes the system to recover after the occurrence of the last transient fault.Next, we provide the assumptions needed for defining this period.
We do not assume execution fairness in the absence of transient faults.We say that a system execution is fair when every step that is applicable infinitely often is executed infinitely often and fair communication is kept.After the occurrence of the last transient fault, we assume the system execution is temporarily fair until the system reaches a legal execution, as in Georgiou et al. [22].
Since asynchronous systems do not consider the notion of time, we use the term (asynchronous) cycles as an alternative way to measure the period between two system states in a fair execution.The first (asynchronous) cycle (with round-trips) of a fair execution R = R • R is the shortest prefix R of R, such that each non-failing node executes at least one complete iteration (of the do forever loop) in R .The second cycle in R is the first cycle in R , and so on.We clarify the term complete iteration (of the do forever loop).It is well-known that self-stabilizing algorithms cannot terminate their execution and stop sending messages [13,Chapter 2.3].Moreover, their code includes a do forever loop.Let N i be the set of nodes with whom p i completes a message round trip infinitely often in R. Suppose that immediately after the state c begin , node p i takes a step that includes the execution of the first line of the do forever loop, and immediately after system state c end , it holds that: (i) p i has completed the iteration of c begin and (ii) every request message m (and its reply) that p i has sent to any non-failing node p j ∈ P during the iteration (of the do forever loop) has completed its round trip.In this case, we say that p i 's complete iteration starts at c begin and ends at c end .

Uniform reliable broadcast
We assume the availability of a self-stabilizing uniform reliable broadcast (URB) [33], which requires that if a node (faulty or not) delivers a message, then all non-failing nodes also deliver this message [27].The task specifications consider an operation for URB broadcasting of message m and an event of URB delivery of message m.The requirements include URB-validity, i.e., there is no spontaneous creation or alteration of URB messages, URB-integrity, i.e., there is no duplication of URB messages, as well as URB-termination, i.e., if the broadcasting node is non-faulty, or if at least one receiver URB-delivers a message, then all non-failing nodes URB-deliver that message.Note that the URB-termination property considers both faulty and non-faulty receivers.This is the reason why this type of reliable broadcast is named uniform.This work also assumes that the operation for URB broadcasting message m returns a transmission descriptor, txDes, which is the unique message identifier.Moreover, the predicate hasTerminated(txDes) holds whenever the sender knows that all non-failing nodes in the system have delivered m.The implementation of hasTerminated(txDes) can just test that the local buffer does not include any record with the message identifier txDes.The solution in [33] can facilitate the implementation of hasTerminated() since the self-stabilizing algorithm in [33] removes obsolete records of messages that were delivered by all non-faulty receivers.

Unreliable failure detectors
Chandra and Toueg [10] introduced the concepts of failure patterns and unreliable failure detectors.Chandra, Hadzilacos, and Toueg [9] proposed the class Ω of eventual leader failure detectors.It is known to be the weakest failure detector class to solve consensus.A pedagogical presentation of these failure detectors is given in Raynal [37].Failure patterns.
Any execution R := (c[0], a[0], c [1], a [1], . ..) can have any number of failures during its run.R's failure pattern is a function F : Z + → 2 P , where Z + refers to an index of a system state in R, which in some sense represents (progress over) time, and 2 P is the power-set of P, which represents the set of failing nodes in a given system state.F (τ ) denotes the set of failing nodes in system state c τ ∈ R. Since we consider fail-stop failures, F (τ ) ⊆ F (τ + 1) holds for any τ ∈ Z + .Denote by Faulty(F ) ⊆ P the set of nodes that eventually fail-stop in the (unbounded) execution R, which has the failure pattern F .Moreover, Correct(F ) = P \ Faulty(F ).For brevity, we sometimes notate these sets as Correct and Faulty.Eventual leader failure detectors.
This class allows p i ∈ P to access a read-only local variable leader i , such that {leader i } 1≤i≤n satisfy the Ω-validity and Ω-eventual leadership requirements, where leader τ i denotes leader i 's value in system state c τ ∈ R of system execution R. Ω-validity requires that ∀i : ∀τ : leader τ i contains a node identity.Ωeventual leadership requires that ∃ ∈ Correct(F ), ∃c τ ∈ R : ∀τ ≥ τ : ∀i ∈ Correct(F ) : leader τ i = .These requirements imply that a unique and non-faulty leader is eventually elected, however, they do not specify when this occurs and how many leaders might co-exist during an arbitrarily long (yet finite) anarchy period.Moreover, no processor can detect the ending of this period of anarchy.

Failure Detectors for the Ω Class
We study a non-self-stabilizing construction of an Ω failure detector (Section 2.4) and propose its self-stabilizing variant.

Non-self-stabilizing Ω failure detector
Algorithm 1 presents the non-self-stabilizing Ω failure detector by Mostéfaoui, Mourgaya, and Raynal [36]; the boxed code lines are irrelevant to [36] since we use them to present our self-stabilizing solution.Note that, in addition to the assumptions described in Section 2.1, Mostéfaoui, Mourgaya, and Raynal make the following operational assumptions (Section 3.2), which are asynchronous by nature.

Operational assumptions
Algorithm 1 follows Assumption 3.1.Let us observe Algorithm 1's communication pattern of queries and responses.Node p i broadcasts ALIVE() queries repeatedly until the arrival of the corresponding RESPONSE() messages from (n − t) receivers (the maximum number of messages from distinct nodes it can wait for without risking being blocked forever).For the sake of a simple presentation (and without loss of generality), it is assumed that nodes always receive their own responses.We refer to the first (n − t) replies to a query that p i receives as the winning responses.The others are referred to as the losing since, after a crash, the failing nodes cannot reply.Assumption 3.1 (Eventual Message Pattern) In any execution R, there is a system state c τ ∈ R, a non-faulty p i ∈ P, and a set Q of (t + 1) nodes, such that, after c τ , each node p j ∈ Q always receives a winning response from p i to each of its queries (until p j possibly crashes).(Note that the time until the system reaches c τ , the identity of p i and the set Q need not be explicitly known by the nodes.)

Variables
The local state includes r i , which is initialized to 0 and is used for indexing p i 's current round of alive queries and responses.Moreover, the array count[] counts the number of suspicions, e.g., count i [j] counts from zero the number of times p i suspected p j .Also, the recF rom set, which is initialized to P, has the identities of the nodes which responded to the most recent alive query.When the application layer accesses the variable leader, Algorithm 1 returns the identity of the least suspected node (line 6).

Algorithm description
Algorithm 1 repeatedly executes a do forever loop (lines 9 to 15), which broadcasts ALIVE(r, •) messages (line 11) and collects their replies, which are the RESPONSE(rJ , •) messages (lines 12 and 20).In this message exchange, every p i ∈ P uses a round number, r i , to facilitate asynchronous rounds without any coordination linking the rounds of different nodes.Moreover, there is no limit on the number of steps any node takes to complete an asynchronous round.The do forever loop.
Each iteration of the do forever loop includes actions (1) to (3).(1) Node p i broadcasts ALIVE(r i , count i ) queries (line 11), and waits for (n − t) replies, i.e., RESPONSE(rJ , recFromJ ) messages from p j ∈ P (line 20), where r i and rJ are matching round numbers.Moreover, count i is an array in which, as said before, count i [k] stores the number of times p i suspected p k ∈ P. Also, recFromJ is a set of the identities of the responders to p j 's most recent query (lines 15 and 20).( 2) By aggregating into prevRecF rom i all the arriving recFromJ sets (line 12), p i can estimate that any p j : j / ∈ prevRecF rom i that does not appear in any of these sets is faulty.Thus, p i increment count i [j] (line 14).( 3) The iteration of the do forever loop ends with a local update to p i 's recF rom i (line 15).Processing of arriving queries.
Upon ALIVE(rJ , countJ ) arrival from p j , node p i merges the arriving data with its own (line 18), and replies with RESPONSE(rJ , recF rom i ) (line 20).This reply includes p j 's round number, rJ , which is not linked to p i 's round number, r i .

Self-stabilizing Ω failure detector
When including the boxed code lines, Algorithm 1 presents an unbounded self-stabilizing variation of the Ω failure detector in [36].(As mentioned before, Section 5 in [22] explains how to convert such unbounded self-stabilizing algorithms to bounded ones.)Note that in [36], all non-crashed nodes converge to a constant value that is known to all correct nodes whereas the counters of crashed nodes increase forever, see claim C2 and C3 of Theorem 97 in [37].Thus, the proposed algorithm includes the following differences from [36].
Algorithm 1 makes sure that any non-failing node does not "hide" a value that is too high in count i [x] without sharing it with all correct nodes.In the context of selfstabilization, such a value can appear due to a transient fault.To that end, Algorithm 1 includes the field count in the RESPONSE() message (line 20) so that the receiver can merge the arriving data with the local one (line 22).
Algorithm 1 also avoids "counting to infinity" since, in the context of self-stabilization, a transient fault can set the counters to arbitrary values.For example, suppose that the counter values that non-faulty nodes associates with all crashed nodes is zero.Also suppose that the counters associated with any non-faulty node is extremely high, say, M = 2 62 .We must not require the system to count from zero to M before it is guaranteed that a non-crashed leader is elected, because it would take more than 146 years to do (of we assume the speed of one nanosecond per communication round).Thus, the proposed solution limits the difference between the extrema counter values in any local array to be less than δ, where δ is a predefined constant.One can view δ as a trade-off parameter between the solution vulnerability (to elect a crashed node as a leader even in the absence of transient faults) and the time it takes to elect a non-faulty leader (after the occurrence of the last transient fault and after the system has reached c τ that satisfies the eventual message pattern assumption, cf.Assumption 3.1).I.e., on the one hand, if the value of δ is set too low, processors that sporadically slow down might be elected, while on the other hand, for very large values of δ, say, M , the time it takes to recover after the occurrence of the last transient faults can be extremely long. Correctness.
Definitions 3.1 and 3.2 are needed for showing that Algorithm 1 brings the system to a legal execution (Theorem 3.2).Definition 3.1 (Algorithm 1's consistent system state) Suppose that max counts i () − min counts i () ≤ δ holds in c ∈ R for any non-faulty p i ∈ P. In this case, we say c is consistent.Definition 3.2 (Complete execution of Algorithm 1) Let R be an execution of Algorithm 1.Let c, c ∈ R denote the starting system states of R, and respectively, R , for some suffix R of R. We say that message m is completely delivered in c if the communication channels do not include ALIVE(r, •) nor RESPONSE(r, •) messages.Suppose that R = R • R has a suffix R , such that for any ALIVE(r, •) or RESPONSE(r, •) message m that is not completely delivered in c , it holds that m does not appear in c.In this case, we say that R is complete with respect to R. Theorem 3.2 (Convergence) (i) Once every non-failing processor completes at least one iteration of the do forever loop (lines 9 to 16) or receive at least one message (lines 17 or 21), the system reaches a consistent state.(ii) Every infinite execution R = R • R of Algorithm 1 reaches within a finite number of steps suffix R , such that R is complete with respect to R (Definition 3.2).
Proof.Lines 16, 19, and 23 imply invariant (i).Invariant (ii) is implied by the assumption that any message can reside in a communication channel only for a finite time (Section 2.2).Theorem 3.3 (Closure) Let R be an execution of Algorithm 1 that starts in a consistent system state.Suppose that R has an eventual message pattern (Assumption 3.1).Algorithm 1 demonstrates in R a construction of the eventual leader failure detector, Ω.
Proof.In the context of Algorithm 1, we say that p i ∈ P inhibits the increment of count i [x] in line 14 when x / ∈ prevRecF rom holds but count i [x] < δ + min counts i () does not.Suppose that, for a given p x ∈ P, there is p k ∈ P that, during R, either increments count k [x] in line 14 or in inhibits such increments for a bounded number of times.In this case, we say that count k [x] is bounded.In all other cases, we say that count k [x] is unbounded.Given a failure pattern F (), we define: is bounded}, and ∀i ∈ Correct(F ) : P L i = {x : count i [x] is bounded}, where the set of processor identities, P L, stands for "potential leaders".These definitions imply ∀i ∈ Correct(F ) : The rest of the proof shows that correct processors share identical sets of potential leaders (P L), which non-empty (Lemmas 3.4), and include only correct processors (Lemmas 3.5 and 3.6).The proof ends by showing that the processors in P L can only be suspected, i.e., their counters are incremented (or inhibited from being incremented), a bounded number of times, and this number is eventually the same at each non-faulty processor (Lemma 3.7).Thus, all correct processors eventually elect the processor that was suspected for the smallest number of times.

Lemma 3.4 P L = ∅
Proof.Since Assumption 3.1 holds, there mus be a system state c τ 0 ∈ R, a processor p i and a set Q of (t + 1) processors for which at any state after c τ 0 , any non-failing processor p j ∈ Q receives winning responses from p i for any of p i 's queries.Due to the assumptions that |Q| > t and that there are at most t faulty processors, Q includes at least one non-faulty processor.Let τ ≥ τ 0 be a time after which no more processors fail.
Processor p k ∈ P : k ∈ Correct(F ) does not stop sending its query (line 11) until it receives RESPONSE() messages from (n − t) processors.Moreover, after c τ , at least (t + 1) processors get winning responses from p i .Therefore, the system eventually reaches a state c τ k ∈ R : τ ≤ τ k after which i ∈ prevRecF rom k holds (line 20.Thus, p k stops incrementing (or inhibiting the increment) of count k [i] at line 14.
Since p k is any correct processors, the system eventually reaches the state In other words, due to the repeated exchange of messages between any pair of non-faulty processors, these processors has a constant value for count[i].

Lemma 3.5 P L ⊆ Correct(F ).
Proof.We show that for every x / ∈ Correct(F ), it holds that p i : i ∈ Correct(F ) increments (or inhibits the increment) of count i [x] for an unbounded number of times during R.The rest of the proof is implied by the fact that non-faulty processors never stop exchanging messages among themselves and merge the arriving information upon message arrival (lines 18 and 22).
Suppose that all the faulty processors have crashed (and their messages RESPONSE() have been received) before c τ ∈ R. Let p i and p j be non-faulty processors, and p x a faulty one.We observe invariants (i) to (iv), which imply the proof.(i) Since p x cannot respond to any of p j 's queries, it holds that x / ∈ rF j , where x / ∈ rF j is the value of recF rom j (which is assigned in line 15) in any system state, c τ , that appears in R after c τ .(ii) Due to invariant (i), it holds that x / ∈ pRF j , where x / ∈ pRF j is the value of prevRecF rom i (which is assigned in line 12) in any system state, c τ , that appears in R after c τ .(iii) Due to invariant (ii), after c τ , every execution of line ?? implies an increment of count i [x] (or the inhibition of an increment).(iv) Since p i sends an unbounded number of queries, invariant (iii) implies that count i [x] is incremented (or the inhibited from incrementing) for an unbounded number of times during R.
Proof.Recall that P L i ⊆ P L (by the definitions of P L and P L i ).Therefore, P L ⊆ P L i implies the proof and P L i ⊆ Correct(F ) (due to Lemma 3.5).Let assume that k ∈ P L and show that k ∈ P L i .That is, we assume that there are k, j ∈ Correct(F ) for which the constant M k is the highest value stored in count j [k] throughout R. In order to prove that k ∈ P L i , we show that count i [k] is also bounded.Since count j [k] ≤ M k throughout R, the repeated exchange of ALIVE() and RESPONSE() messages between the correct processors p i and p j (line 11, lines 17 to 18 and lines 21 to 22), implies that Proof.This is due to the repeated exchange of ALIVE() and RESPONSE() messages between p i and p j (line 11, lines 17 to 18 and lines 21 to 22).
This ends the proof of Theorem 3.3.

Background: Non-self-stabilizing Zero-degrading Binary Consensus
Algorithm 2 is a non-self-stabilizing Ω-based binary consensus algorithm that is indulging and zero-degrading.For the sake of a simpler presentation of the correctness proofs, Algorithm 2's line enumeration continues the one of Algorithm 1.

Algorithm structure
Algorithm 2 proceeds in asynchronous rounds that combine, each, two phases.The algorithm aims to have, by the end of phase zero, the same value, which is named the estimated value.This selection is done by a leader, whose election is facilitated by the Ω failure detector.Next, during phase one, the algorithm tests the success of phase zero.The challenge here is that, due to the asynchronous nature of the system, not all nodes run the same round simultaneously.Therefore, the test considers the agreement on the round number, the leader identity, and the proposed value.Moreover, just before deciding on any value, say v, the deciding node broadcasts a DECIDE(v) message.Upon DECIDE(v) arrival, the receiver repeats the broadcast of the arriving message before deciding.Algorithm 2 executes the "decide" action by returning with v from propose(v)'s invocation.This technique of 'broadcast repetition' basically lets Algorithm 2 to invoke a reliable broadcast of the decided value.
The system behavior during phase zero.
The objective of phase zero of round r is to let all nodes to store in est [1] the same value.Once that happens, a decision can be taken during phase one of round r.As we are about to explain, that objective is guaranteed to be achieved once a single leader is elected.
The main challenge that phase zero addresses is the provision of the safety property, i.e., no two different decisions are made, during Ω's anarchy period in which there is no single non-faulty elected leader.To that end, phase zero makes sure that the quasi-agreement property always holds before anyone enters phase one of round r, where then from the perspective of p i , it is not ready to decide any value.Therefore, a system state that satisfies the quasi-agreement property allows the individual nodes to decide during phase one on the same value (when est i [1] = est j [1] = v), or defer the decision to the next round (when est i [1] = ⊥).In order to satisfy the quasi-agreement than n − t nodes jeopardizes the system liveness.Moreover, t < n/2 and thus any set of n − t nodes is a majority set, which contains at least one correct node.Processor p i may stop broadcasting also when it receives a PHASE(0, r, •) message from its leader, i.e., p myLeader i , or when a new leader is elected, i.e., myLeader i = leader i .
Action (2): after the above broadcast, p i 's assignment to est i [1] (line 35) satisfies the quasi-agreement property by making sure that (i) a majority of nodes consider p as their leader when they broadcast the PHASE(0, r, •, ), and (ii) p i received PHASE(1, r, v, •) from p .In other words, if (i) and (ii) hold, p i can assign v to est i [1], which is p 's value in est [0] at the start of round r.Otherwise, est i [1] gets ⊥.Due to the majority intersection property, no two majority sets can have two different unique leaders.Therefore, it cannot be that Corollary 4.1 is implied by the above.The system behavior during phase one.
During this phase, p i broadcasts PHASE(1, r, est i [1]) until it hears from n − t nodes.By the quasi-agreement property, ∃v ∈ V : ∀p j ∈ P : est j [1] = ⊥ ∨ est j [1] = v = ⊥ holds during round r.Thus, for the set of all received estimated values, rec i ∈ {{v}, {v, ⊥}, {⊥}} (line 38) holds.For the rec i = v case, p i can broadcast DECIDE(v) before deciding v (line 39).For the rec i = {v, ⊥} case, p i uses v during round r + 1 as the new estimated value est i [0] since some other node might have decided v (line 40).For the rec i = {⊥} case, p i continues to round r + 1 without modifying est i [0] (line 41).Note that, at any round r, it cannot be the case that both rec i = {v} and rec j = {⊥} hold, since p i 's broadcast of DECIDE(v) implies that it had received PHASE(1, r i , v) from a majority of nodes.Due to the majority intersection property, there is at least one PHASE(1, r i , v) arrival to any p j ∈ P that executes line 38 since it also received PHASE(1, r i , •) messages from a majority.Thus, rec j = {⊥} cannot hold.The necessity of broadcasting v before deciding on it.
Algorithm 2 has to take into consideration the case in which not all nodes decide during round r.E.g., a majority of nodes might decide on round r, while a minority of them continues to round r + 1 during which it must not wait in vain to hear from a majority.By broadcasting DECIDE(v) before deciding v, Algorithm 2 allows the system to avoid such bad situations since once p i decides, it is guaranteed that eventually, all correct nodes decide.

Self-stabilizing Indulgent Zero-degrading Binary Consensus
Algorithm 3 is our self-stabilizing variation on Guerraoui and Raynal [25].The main difference between the proposed solution and Algorithm 2 occurs after a value was decided.Then, Algorithm 2 broadcasts before terminating (lines 39 and 42) whereas our self-stabilizing solution repeats the broadcast until the consensus object is deactivated by the invoking algorithm.This follows a well-known impossibility [13,Chapter 2.3] that self-stabilizing systems cannot terminate.Specifically, in the context of self-stabilization, Algorithm 2 can be started in a system state in which exactly half of the nodes are at the (normal) initial state of binary objects.Moreover, due to the presence of transient faults, the program counters of the other half of the nodes can point to the return command which in turn extends the stabilization time.

Variables
As explained in Section 2.1, the proposed binary consensus objects are used by multivalued consensus objects, i.e., BC[] is an array of n binary consensus objects and CS[] is an array of M multivalued consensus objects.The binary consensus objects of Algorithm 3 have the private variables, which store the sequence number of the multivalued consensus object, seq, a node index, k : p k ∈ P, and current round number, r.Also, the results of phase x ∈ {0, 1} is stored est[x] and est [2] stores the decided value.Algorithm 3 also stores the current identity of the leader, myLeader , the round number aggregated from all received values, newR, and the transmission descriptor of the reliable broadcast of the decided value txDes.We say the binary object x has an active reliable broadcast when (x.txDes = ⊥ ∧ hasTerminated(x.txDes)),i.e., x.txDes stores a descriptor of a transmission that has not terminated.

Message structure
Algorithm 3 uses the DECIDE(seq, k , v ) and PHASE(phase, ackNeed , seq, k , r , v , leader , newR) messages, where the field phase refers to the phase number, ackNeed indicates whether a reply is needed, seq is the sequence number, k is the node index, r is the round number, v is the estimated value, leader is the round leader, and newR is the sender's round number.

Interface operations
The operation propose(s, k, v) (Section 2.1) allows the invoking node to propose value v with sequence number s and node index k (line 44).The operation result(s, k) returns the decided value, if such decision occurred (line 45).Otherwise, ⊥ is returned.The operation deactivate(s, k) assigns ⊥ to CS[s].BC[k] (line 46).

The do forever loop (lines 47 to 60)
The nodes iterate over all active binary objects, x, that do not have an active reliable broadcast.In case x has a decided value and it had an active transmission that has terminated (line 48), p i initializes x's transmission descriptor.Also, in case x has a decided value but is has no active transmission (line 49), p i broadcasts the decided value.In line 50, p i increments the round number and sample the Ω failure detector.Algorithm 3 considers situations in which, due to a transient fault, the round numbers go out of sync.It does this by letting newR aggregate the highest round number that is disseminated in each message exchange (lines 66, 67, 69, and 73).Then, at the start of a new round, the highest known round number is used (line 50).
Although the above example considers a case that can only happen before the start of the system execution, cf.Section 2.2, the system cannot know whether its current state is the starting one.Therefore, the system has to always be ready to recover from arbitrary transient faults.We also clarify that our model does not limit the number of nodes that can be affected by any arbitrary transient faults.It is only the example above that makes this assumption.Phase 0 (lines 52 to 54).
In this phase p i broadcasts PHASE(0, True, s, k, r, est[0], myLeader, r), such that the phase field is 0, acknowledgment is needed, the sequence number is seq, the node index is k, the round number is r, the estimated result is est[0], the message leader is myLeader and the message aggregated round number is r.This broadcasting repeats as long as the binary object neither has an active broadcast, nor stores a decided value.Moreover, the broadcasting continues until PHASE(0, -, seq, k, r, •) is received from n − t nodes (which means that phase 0 messages were received from a majority of nodes during round r i ), or PHASE(0, -, seq, k, r, •) is received from p myLeader ∨ myLeader = leader]} (which means that some nodes follow a leader different than p myLeader i during r i ).Phase 0 ends by testing in line 54 whether a phase 0 message was received from a majority of nodes that have reported on the same leader, p , from which a message was received.If this is the case, p i uses the value, v, received from p as the estimated result for phase 1 by assigning v to est i [1].Otherwise, ⊥ is assigned.Phase 1 (lines 56 to 60).
In this phase p i broadcasts PHASE(1, True, seq, k, r, est [1], r), such that the phase field is 1, acknowledgment is needed, the sequence number is seq, the node index is k, the round number is r, the estimated result is est [1] and the message aggregated round number is r.As in phase 0, this broadcasting repeats as long as the binary object neither has an active broadcast, nor stores a decided value.Moreover, the broadcasting continues until PHASE(1, •, seq, k, r, •) was received from n − t nodes (which means that phase 1 messages were received from a majority of nodes during round r i ).Phase 1 ends by testing the set, rec i , of received estimated results during this phase (line 58).By the quasi-agreement property (Section 4), rec i ∈ {{v}, {v, ⊥}, {⊥}} holds.When rec i = v holds, p i can reliably broadcast DECIDE(seq, k, v) (line 60).When rec i = {v, ⊥} holds, p i uses v as the new estimated value est i [0] for round r + 1 since some other node might have decided v (line 61).When rec i = {⊥} holds, est i [0] is unchanged before round r + 1 (line 62).

The arrival of PHASE() messages
This arrival updates (and even initializes) the local state of the binary consensus, O i , that has the sequence number sJ and node index kJ , where nJ , aJ , sJ , kJ , rJ , vJ , myLeaderJ , newRj are the message fields.Before this, there is a need to test sJ and validate that CS[sJ mod M ] is an active object.If this is not the case, a reply is sent to the sender (if aJ indicates that this is needed) and the procedure returns (line 64).
Line 65 prepares the binary consensus object O i and line 66 tests whether O i needs to be initialized.Otherwise, Algorithm 3 updates the aggregated round number (line 67).Line 68 is applicable only for phase 1 messages.It tests whether O i has ⊥ as its estimated result.When this is the case, vJ is used as O i 's estimated value.The procedure ends by acknowledging the sender, if needed (line 69).
Generality is not lost due to the proof of Claim 6.2 since the case in which CS i [s mod M ].BC[k mod n].est [2] = ⊥ holds, implies that eventually ∀p x ∈ P : x ∈ Correct : CS x [s mod M ].BC[k mod n] = ⊥ holds, i.e., termination.Towards a contradiction, suppose that r is the smallest round in which a correct processor p i executes indefinitely.The only two loops in which p i can continue to execute forever in round r are the repeat-until loops in lines 52 to 53 and 56 to 57.
By the choice of r as well as lines 50, 65, and 69, no correct processor can continue to execute forever in round r < r.Therefore, p i receives PHASE(0, •, s, k, r, •) at least (n − t) times.Moreover, if its current leader, p myLeader i , is correct, p i receives at least one PHASE(0, •, s, k, r, •) message from p myLeader i .Furthermore, if p myLeader i is faulty, eventually it holds that myLeader i = leader i (by Ω's eventual leadership).Thus, no correct processor p i can execute forever the repeat-until loop in lines 52 to 53 during round r.By similar arguments, during phase one of round r, processor p i receives PHASE(1, •, s, k, r, •) messages at least (n − t) times from the correct processors.Thus, during round r, processor p i does not execute forever the repeat-until loop in lines 56 to 57.Note that we have reached a contradiction with the assumption that r is the smallest round in which a correct processor executes forever and therefore the claim is true.Claim 6.4 Eventually only the correct nodes are alive and connected and ∀x ∈ Correct : Proof of Claim 6.4.Assume, toward a contradiction, that no node ever decides with respect to sequence number s, p k ∈ P, and proposal v ∈ V .Recall Ω's eventual leadership and the fact that faulty nodes eventually crash (by definition).Thus, Claim 6.3 implies the existence of a finite round number r from which (a) only the correct nodes are alive and connected, as well as (b) all correct p i ∈ P share the same correct leader, e.g., p d , in myLeader i .The end condition of the repeat-until loop in line 53 holds for p i eventually.This is because there are more than n/2 correct nodes.Each such node, including p x , broadcasts PHASE(0, •, s, k, r, v, •) and receives at least n − t times the messages PHASE(0, •, s, k, r, •) (cf.Claim 6.3's proof).Once line 53's condition holds, by the same reasons, also the if-statement condition in line 54 holds as well.Thus, p i assigns v to CS[s].BC[k] i .est[1], and during phase 1 of round r, p i only sends PHASE(1, •, s, k, r, v, •).Since this is true for any correct p i , it must be that rec i = {v} (line 58).Therefore, every correct p i ∈ P executes line 60, in which p i URB-broadcasts DECIDE(s, k, v).Moreover, upon the URB delivery of DECIDE(s, k, v), every correct node decides in line 74 (Claim 6.2).
This completes Theorem 6.1's proof.Theorem 6.5 uses the definition of consistent executions.Let p i , p k ∈ P be two nodes in the system and s be a sequence number.Let c be a system state in which the if-statement condition in line 44 holds with respect to p i .Moreover, no communication channel include the messages DECIDE(sek = s, k = x, •) and PHASE(•, sek = s, k = x, •).In this case, we say that p i can have a consistent invocation of Algorithm 3's propose i (s, k) in c.Let R be an execution of Algorithm 3 in which for any c ∈ R, for any p i ∈ P we can either (i) say that p i can have consistent invocations of binPropose i () in c, or (ii) c is the result of only consistent invocations of propose().In this case, we say that R is a consistent execution of Algorithm 3. Theorem 6.5 Let R be a consistent execution of Algorithm 3. The system demonstrates in R a construction of a bounded-size array of binary consensus objects.
Proof.Termination, validity, and integrity.Termination holds due to Theorem 6.1.Integrity holds since p i ∈ P decides by assigning a non-⊥ value to O i .est[2].This happens only in line 74 and when O i .est[2] = ⊥.Thus, it can happen at most once per unique pair of sequence number, sJ , and processor identifier, kJ , cf. line 72 for the assignment of O i 's value.
With respect to validity, by line 60 we can see that DECIDE(s, k, v) messages can only be sent with a non-⊥ value in the field v since ⊥ is not in the domain, V , of values that one can propose.Thus, when p i receives a DECIDE() message, line 74 never assigns to O i .est[2] a ⊥-value.That is, p i decides on a non-⊥ value that comes from est [1] of some entry CS j [s].BC [k], which in turn comes from est[0] of some entry CS x [s].BC [k], where p j , p x ∈ P. Since R is a consistent execution, est[0] can contain only proposed values that Algorithm 3 assigns in line 44.Moreover, est [1] can contain only values that Algorithm 3 copied from est[0] in lines 54 and 68.Thus, the validity property holds.Agreement.Claim 6.6 implies agreement since it shows that only a single value can be decided in a consistent execution.Proof of Claim 6.6.Invariant (i).By the code of Algorithm 3, p i receives during r at least n−t times the message PHASE(0, •, s, k, r, v, •), see the proof of Claim 6.3.Moreover, p j has received during round r at least n − t times the message PHASE(1, •, s, k, r, v, •).During consistent executions, p x ∈ P can only transmit (and perhaps retransmit) one PHASE(0, •, s, k, r, v, •) message.Due to the property of majority intersection, p i and p j receive during round r the same message PHASE(1, •, s, k, r, w, •) from some processor p x ∈ P. Since both p i and p j executes line 60 during round r, it must be the case that w = v = v .Invariant (ii).Suppose that some correct p i ∈ P URB broadcasts DECIDE(s, k, v) during a round r.Also, p j ∈ P continues to round r + 1.We have to prove that CS j [s mod M ].BC[k mod n].est[0] = v when p j starts round r + 1.Since p i URB broadcasts DECIDE(s, k, v) during round r, lines 57 and 60 implies that there were at least (n − t) nodes that have sent PHASE(1, •, s, k, r, v, •) to p i during round r.By the fact that n − t > n/2 and the majority intersection property, we know that p j also had to receive during round r at least one of these PHASE(1, •, s, k, r, v, •) messages.Also, it follows from the quasi-agreement property (Corollary 4.1) that p j receives both v and ⊥ (and no other value) in the phase 1 of round r, i.e., rec j = {v, ⊥}, because rec j = {v} since rec j = {v} implies that p j URB broadcasts DECIDE(s, k, v) during r.Thus, p j assigns v to CS j [s mod M ].BC[k mod n].est[0] before continuing to round r + 1.
This completes Theorem 6.5's proof.

Conclusions
We showed how a non-self-stabilizing algorithm for indulgent zero-degrading binary consensus by Guerraoui and Raynal [25] can be transformed into one that can recover after the occurrence of transient faults.We also obtained a self-stabilizing asynchronous Ω failure detector from the non-self-stabilizing construction by Mostéfaoui, Mourgaya, and Raynal [36].As an extension, we note that Ben-Or [5] presented a randomized binary consensus (using local coins).It differs from Algorithm 2 only in line 41, where it assigns to est[0] a random binary value.This is orthogonal to the algorithm's ability to recover from transient-faults.As future work, we encourage the reader to take these building blocks into account as well as the techniques used to make them self-stabilizing when designing distributed systems that can recover from transient faults.

Figure 1 :Definition 1 . 1 (
Figure 1: The studied problem of binary consensus (in bold) in the context of a relevant protocol suite.
as an alternating sequence of system states c[x] and steps a[x], such that each c[x + 1], except for the starting one, c[0], is obtained from c[x] by a[x]'s execution.

Claim 6 . 6
Let r be the smallest round during which any p i ∈ P URB-broadcasts DECIDE(s, k, v).Suppose that p j ∈ P also URB broadcasts DECIDE(s, k, v ) during round r. (i) It holds that v = v.Let v be the local estimate CS x [s mod M ].BC[k mod n].est[0] of any p x ∈ P that proceeds to round r + 1. (ii) It holds that v = v .