On the Effects of Soft Errors in Embedded Control Systems
Doktorsavhandling, 2005

This thesis investigates techniques for making closed loop control systems fault-tolerant and robust with respect to soft errors occurring in the computer hardware. Soft errors are caused by transient faults that alter the binary values stored in latches, flip-flops and other state elements without causing any permanent damage to the hardware. Soft errors caused by ionizing particles such as high energy neutrons are expected to become a dominating source of hardware failures in future digital circuits. Software implemented techniques for detecting and tolerating soft errors for closed loop control systems are proposed and evaluated. These software techniques are designed to serve as a complement to hardware implemented error detection and correction mechanisms that are present in most computer systems. The objective is to provide a software layer of fault-tolerance mechanisms that can detect, mask or recover from soft errors that escape the hardware mechanisms. Fault injection experiments with control systems for both a four-stroke combustion engine and a jet engine show that a majority of the soft errors (single bit-flips) in CPU-registers and memory have no or minor impact on the behavior of the engines. However, the experiments also show that a small but significant number of the errors result in critical engine failures. These critical failures are predominantly caused by soft errors affecting the state variables of the control algorithm. We present the design and validation of two error detection and recovery techniques called Best Effort Recovery and the Robust Integrator. These techniques are designed to protect the controller state and are experimentally validated by fault injection experiments. The Best Effort Recovery technique performs a rollback recovery if the state variables or the control output are outside defined value bounds. The Robust Integrator is constructed as a generic component in a tool for model-based design and can thus be used for robust implementation of a wide range of control algorithms. To validate these techniques, we have developed a new fault injection tool called GOOFI (Generic Object-Oriented Fault Injection). The tool has been designed to be easily adaptable to different target systems and simple to extend with new fault injection techniques and fault models.

fault tolerance

error recovery techniques

concurrent error detection

pre-injection analysis

embedded control systems

fault injection

soft errors

dependable software

Författare

Jonny Vinter

Chalmers, Data- och informationsteknik

Ämneskategorier

Data- och informationsvetenskap

ISBN

91-7291-630-3

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 2312

Technical report D - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University: 7