On Fault Injection-Based Assessment of Safety-Critical Systems
Doctoral thesis, 2010
This thesis deals with techniques for designing and evaluating error detection and recovery mechanisms for computer systems. For the assessment of such systems, we describe a comprehensive fault injection tool that is capable of emulating the effects of hardware errors in microprocessors. The tool integrates three known fault injection techniques into a unified framework, and is easy to extend with support for new techniques and target systems.
The techniques included in the tool have different characteristics with respect to observability and temporal intrusiveness, but can nevertheless be used to inject the exact same faults. However, due to uncertainties associated with each technique, which we identify and discuss, the results of injecting a given fault may differ to some extent. We therefore perform an analysis to determine if results obtained with the three techniques are metrologically compatible, and thereby meaningful for dissemination and comparison.
To illustrate the practical usage of the tool, we describe an evaluation-driven design process for development of software-implemented fault tolerance. We used this process to develop two software mechanisms for a prototype brake controller. The mechanisms were specifically designed to prevent transient hardware errors from causing critical system failures. The effectiveness of the software mechanisms were evaluated with extensive fault injection experiments. The results show that these mechanisms can effectively reduce the probability of critical brake controller failures.
Transient Faults
Fault Injection
Error Detection and Recovery
Fault Tolerance
Embedded Control Systems