Measuring the Impact of Hardware Errors in Computer Systems
Licentiate thesis, 2014

This thesis addresses the problem of measuring hardware error sensitivity of computer systems. Hardware error sensitivity is the probability that a hardware error will result in an erroneous output. Measuring the hardware error sensitivity is important since the rate of transient, intermittent and permanent transistors faults increases as a result of integrated circuit technology scaling. Error sensitivity is influenced by several parameters. This thesis investigates six such parameters, or sources of variation in error sensitivity, in a series of fault injection experiments. In these experiments, bit flip errors were injected into a microprocessors instruction set architecture (ISA) registers and main memory words in order to emulate the errors caused by transient hardware faults. The sources of variation that were addressed include, the ones that deal with systems characteristics, namely, (i) the input processed by a program, (ii) the program’s source code implementation, (iii) the distribution of machine instructions, and (iv) the level of compiler optimization; and the ones that deal with the measurement setup, namely, (v) the number of bits that are targeted in each fault injection experiment and (vi) the significance of the bit, or bits, targeted for fault injection. The experiments identified four factors that had a strong impact on error sensitivity: (1) the location of the erroneous bit, or bits, within a register or memory word, (2) the type of machine instruction targeted for fault injection, (3) the input to program and (4) a programs source code implementation. In contrast, variations in compiler optimization were shown to have a minor impact on error sensitivity. The experiments also show that there was no significant difference in error sensitivity between single and double bit flips when these occurred within same register or memory word.

Sources of Variation

Fault Injection

Dependability Assessment

Microprocessor Faults

Hardware Error Sensitivity

EA, EDIT-building, Rännvägen 6, Chalmers University of Technology, Göteborg, Sweden
Opponent: Prof. Olaf Spinczyk, Department of Computer Science, TU Dortmund, Germany

Author

Behrooz Sangchoolie

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

A Study of the Impact of Single Bit-Flip and Double Bit- Flip Errors on Program Execution

Computer Safety, Reliability, and Security. SAFECOMP, September 24-27.,; (2013)

Paper in proceeding

Benchmarking the Hardware Error Sensitivity of Machine Instructions

Proceedings of the 2013 IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE 9),; (2013)

Paper in proceeding

A Study of the Impact of Bit-flip Errors on Programs Compiled with Different Optimization Levels

10th European Dependable Computing Conference, EDCC 2014; Newcastle upon Tyne; United Kingdom; 13 May 2014 through 16 May 2014;,; (2014)p. 146-157

Paper in proceeding

On the Impact of Hardware Faults – An Investigation of the Relationship between Workload Inputs and Failure Mode Distributions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; Vol. 7612(2012)p. 198-209

Paper in proceeding

Subject Categories

Computer Engineering

Technical report L - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University: 119L

EA, EDIT-building, Rännvägen 6, Chalmers University of Technology, Göteborg, Sweden

Opponent: Prof. Olaf Spinczyk, Department of Computer Science, TU Dortmund, Germany

More information

Created

10/7/2017