On the Effects of Soft Errors in Embedded Control Systems
Doktorsavhandling, 2005
This thesis investigates techniques for making closed loop control systems fault-tolerant and robust with respect to soft errors occurring in the computer hardware. Soft errors are caused by transient faults that alter the binary values stored in latches, flip-flops and other state elements without causing any permanent damage to the hardware. Soft errors caused by ionizing particles such as high energy neutrons are expected to become a dominating source of hardware failures in future digital circuits.
Software implemented techniques for detecting and tolerating soft errors for closed loop control systems are proposed and evaluated. These software techniques are designed to serve as a complement to hardware implemented error detection and correction mechanisms that are present in most computer systems. The objective is to provide a software layer of fault-tolerance mechanisms that can detect, mask or recover from soft errors that escape the hardware mechanisms.
Fault injection experiments with control systems for both a four-stroke combustion engine and a jet engine show that a majority of the soft errors (single bit-flips) in CPU-registers and memory have no or minor impact on the behavior of the engines. However, the experiments also show that a small but significant number of the errors result in critical engine failures. These critical failures are predominantly caused by soft errors affecting the state variables of the control algorithm.
We present the design and validation of two error detection and recovery techniques called Best Effort Recovery and the Robust Integrator. These techniques are designed to protect the controller state and are experimentally validated by fault injection experiments. The Best Effort Recovery technique performs a rollback recovery if the state variables or the control output are outside defined value bounds. The Robust Integrator is constructed as a generic component in a tool for model-based design and can thus be used for robust implementation of a wide range of control algorithms.
To validate these techniques, we have developed a new fault injection tool called GOOFI (Generic Object-Oriented Fault Injection). The tool has been designed to be easily adaptable to different target systems and simple to extend with new fault injection techniques and fault models.
fault tolerance
error recovery techniques
concurrent error detection
pre-injection analysis
embedded control systems
fault injection
soft errors
dependable software