Sources of Variations in Error Sensitivity of Computer Systems
Licentiate thesis, 2014

Technology scaling is reducing the reliability of integrated circuits. This makes it important to provide computers with mechanisms that can detect and correct hardware errors. This thesis deals with the problem of assessing the hardware error sensitivity of computer systems. Error sensitivity, which is the likelihood that a hardware error will escape detection and produce an erroneous output, measures a system’s inability to detect hardware errors. This thesis present the results of a series of fault injection experiments that investigated how er- ror sensitivity varies for different system characteristics, including (i) the inputs processed by a program, (ii) a program’s source code implementation, and (iii) the use of compiler optimizations. The study focused on the impact of tran- sient hardware faults that result in bit errors in CPU registers and main memory locations. We investigated how the error sensitivity varies for single-bit errors vs. double-bit errors, and how error sensitivity varies with respect to machine instructions that were targeted for fault injection. The results show that the in- put profile and source code implementation of the investigated programs had a major impact on error sensitivity, while using different compiler optimizations caused only minor variations. There was no significant difference in error sen- sitivity between single-bit and double-bit errors. Finally, the error sensitivity seems to depend more on the type of data processed by an instruction than on the instruction type.

error sensitivity

compiler optimization

transient fault

bit flipping

fault tolerance

fault injection

Room EA, EDIT-Building, Rännvägen 6, Chalmers University of Technology
Opponent: Dr. Domenico Cotroneo, Associate professor,Department of Computer Science and System Engineering at the University of Naples, Italy.

Author

Fatemeh Ayatolahi

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

On the Impact of Hardware Faults – An Investigation of the Relationship between Workload Inputs and Failure Mode Distributions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 7612(2012)p. 198-209

Paper in proceeding

A Study of the Impact of Single Bit-Flip and Double Bit- Flip Errors on Program Execution

Computer Safety, Reliability, and Security. SAFECOMP, September 24-27.,;(2013)

Paper in proceeding

A Study of the Impact of Bit-flip Errors on Programs Compiled with Different Optimization Levels

10th European Dependable Computing Conference, EDCC 2014; Newcastle upon Tyne; United Kingdom; 13 May 2014 through 16 May 2014;,;(2014)p. 146-157

Paper in proceeding

Benchmarking the Hardware Error Sensitivity of Machine Instructions

Proceedings of the 2013 IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE 9),;(2013)

Paper in proceeding

Subject Categories (SSIF 2011)

Computer Engineering

Embedded Systems

Computer Systems

Technical report L - Department of Radio and Space Science, Chalmers University of Technology, Göteborg, Sweden: 116L

Publisher

Chalmers

Room EA, EDIT-Building, Rännvägen 6, Chalmers University of Technology

Opponent: Dr. Domenico Cotroneo, Associate professor,Department of Computer Science and System Engineering at the University of Naples, Italy.

More information

Latest update

6/7/2021 7