Synchronization and Communication Results in Safety-Critical Real-Time Systems
Doktorsavhandling, 1999

A growing number of industrial control applications employ computer control to provide enhanced functionality and reduced cost. Many of these applications are safety-critical and require fault-tolerant techniques to provide an adequate level of safety. The increasing complexity and manufacturing requirements give rise to control needs that necessitate the use of distributed, real-time approaches. A cost effective approach to building such systems is to use a time-slotted broadcast bus (Time Division Multiple Access, TDMA). In this research, the communication functionality of this type of architecture is described, and efficient, low-cost algorithms for membership agreement and atomic multicast are proposed. TDMA requires system synchronization. To avoid separate control channels, both data and synchronization can share the same communication channel; it is also desirable to avoid separate synchronization signals or messages. For TDMA communication, a new initialization algorithm is proposed, which provides initial synchronization using ordinary data messages. The correctness of the initialization algorithm is established using both theoretical analysis and symbolic model checking. Also presented is a new fault-tolerant clock synchronization algorithm which exploits the special characteristics of broadcast channels to provide improved synchronization precision when low accuracy oscillators are used. The proposed algorithms are compared with existing methods by using the results of simulated fault injection experiments in which random transient faults are injected onto the bus. Both proposed algorithms performed better than existing methods for system availability and precision of synchronization. An investigation of design issues in distributed systems is also part of this thesis work. The beneficial impact of clock synchronization on control performance and scheduling policy implementation was studied. The context was a distributed control loop executed by a simple two-node system using a broadcast bus. Three parameters relevant for control performance were examined: input and output jitter and control delay variation. Also, the communication rate expected for a typical automotive application was compared for three pairs of design parameters: logically distributed vs. logically central systems; the use of duplex fail-silent computers vs. using single fault-tolerant nodes; and, allowing variable message lengths vs. identical messages for all nodes.



fault tolerance

membership agreement


atomic broadcast

embedded systems

clock synchronization

distributed real-time systems


Henrik Lönn

Institutionen för datorteknik


Data- och informationsvetenskap



Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 1535

Technical report - School of Electrical and Computer Engineering, Chalmers University of Technology, Göteborg, Sweden