Issues in the Design and Analysis of Dependable Distributed Real-Time Systems
The importance of computer system dependability is increasing as safety-critical control systems are becoming more prevalent. As an example, the techniques and concepts used in present-day fly-by-wire aircraft can be expected to be introduced in tomorrow's automobiles. These types of applications have some characteristics in common: they have very strong reliability requirements, they handle hard real-time tasks and they are suited for distributed architectures. DACAPO is a system architecture that has been tailored to these characteristics and which addresses the important topics of time-determinism and fault tolerance in distributed systems. A DACAPO system consists of a number of fault-tolerant network nodes interconnected via a double-redundant bus. Each node interfaces directly to a set of sensors and actuators and performs local closed-loop control algorithms. Furthermore, it participates in system-wide control algorithms. The entire DACAPO system employs static scheduling of both task execution and inter-node communication. There is, however, some support for event-triggered task activation and the associated communication.
In a system design phase, reliability modeling and analysis can provide information on how to design a system by facilitating comparisons of different design alternatives. During this phase, little is known about implementation details so the reliability modeling has to be made at a rather high level. Thus, reliability analysis tools which require numerical values for all relevant parameters as input have limited use in the design phase. Although sensitivity analysis based on numerical techniques can be carried out, it becomes a formidably complex task when the number of parameters is large and the range of each parameter value is vast. What is needed is an analysis technique that gives closed-form expressions for the failure probability in order to facilitate an appreciation of the importance of each relevant parameter. A closed-form approximation approach has been developed for this purpose. In this approach, approximate solutions are established for the Markov processes which result when certain repairable fault-tolerant systems with exponentially distributed inter-event times are modeled The approximation technique is based on the establishment of approximate expressions for the probability that the system reaches a failure state via any given path of system states. The failure probability may then be expressed as the sum of all these probabilities. An upper bound on the approximation error is established for each such state path considered in isolation and a method for combining these errors into an error of the final approximation is given. Thus, the closed-form approximation approach consists of a technique for the establishment of a reliability expression with known upper bounds on its approximation error. The resulting failure probability approximation is provably conservative in the sense that it is always higher than the theoretically correct failure probability associated with the Markov model in question. This approximation technique has been applied to some basic fault-tolerant configurations with particular consideration of the possible scenarios unfolding when a fault occurs. The results indicate that the technique is indeed useful for preliminary reliability analysis. A similar technique, adapted to somewhat different Markov model characteristics, has been used to investigate the failure probability of bus and ring network topologies.
Electromagnetic interference constitutes one of the most serious threats to the successful operation of computer systems, particularly in embedded applications. Electromagnetic compatibility (EMC) is the condition that permits the coexistence of several electrical and electronic systems without any of them being disturbed by the others. This thesis includes an overview of EMC problems and solutions with particular attention to automotive electronics.