Fault Injection for Studying Error Behavior and Validation of Error Detecting Mechanisms
Doctoral thesis, 1995
This thesis deals with the design and validation of low-cost error detecting mechanisms that can be used to implement self-checking computers. The research objectives of the thesis are three-fold: (i) to investigate and develop a simulation-based fault injection technique that can be used on a wide range of VHDL simulation models, (ii) to investigate error propagation mechanisms in microprocessors in order to understand how to design low-cost error detecting mechanisms and (iii) to design and validate error detecting mechanisms. Simulation-based fault injection is used throughout the thesis to investigate error propagation in microprocessors and validate proposed error detecting mechanisms.
The new techniques used in the MEFISTO (Multi-level Error and Fault Injection Simulation TOol) tool for injecting simulated faults into VHDL simulation models on the gate, component and functional levels are presented. With MEFISTO, it is possible to inject temporary and permanent stuck-at faults, bit-flips and, most importantly, user-defined functional faults. Functional- level simulations are an attractive alternative to detailed gate-level simulations, due to the gain in simulation speed. Furthermore, the VHDL language encourages the writing of functional-level models. The thesis addresses and partly solves the problem of how to determine which errors to inject on the functional level, with which distribution, in order to accurately emulate the effects of an underlying gate or component-level fault distribution.
The results of three diverse simulation-based fault injection studies are also presented. Three different microprocessors, TRIP, TRIP II and DP32, are used in the studies. TRIP and TRIP II are custom-designed 32-bit pipe-lined RISC processors and DP32 is a very simple 32-bit processor. Various aspects of the error behavior and error propagation mechanisms of the three microprocessors are investigated to gain understanding of how to design low-cost error detecting mechanisms. Faults that propagate and lead to control flow errors are studied in particular.
Finally, three novel error detecting mechanisms are presented and validated. Theoretical analysis methods and simulation-based and physical fault injection techniques are used in the validation. In particular, an effort is made to support experimental validation results using theoretical analysis methods, and vice-versa. Of the presented error detecting mechanisms, two detect control flow errors and one detects data errors. The two control flow checking mechanisms supplement each other. One mechanism, called ISC (Implicit Signature Checking) detects in excess of 99.99% of all control flow errors but cannot be used to externally monitor processors with built-in cache memory. The other mechanism, called TTA (Time-Time-Address checking) can be used to externally monitor processors with on-chip caches, but with a coverage of (only) 98%. The data error detecting mechanism, in combination with a watchdog timer and the ISC control flow checking mechanism, detects 99.99% of all injected transient faults and 96% of all injected permanent faults.
concurrent error detection
functional error model
control flow error
error detection coverage
error detection latency
simulation-based fault injection
physical fault injection