Towards Accurate Estimation of Error Sensitivity in Computer Systems
Doctoral thesis, 2021

Fault injection is an increasingly important method for assessing, measuringand observing the system-level impact of hardware and software faults in computer systems. This thesis presents the results of a series of experimental studies in which fault injection was used to investigate the impact of bit-flip errors on program execution. The studies were motivated by the fact that transient hardware faults in microprocessors can cause bit-flip errors that can propagate to the microprocessors instruction set architecture registers and main memory. As the rate of such hardware faults is expected to increase with technology scaling, there is a need to better understand how these errors (known as ‘soft errors’) influence program execution, especially in safety-critical systems.
Using ISA-level fault injection, we investigate how five aspects, or factors, influence the error sensitivity of a program. We define error sensitivity as the conditional probability that a bit-flip error in live data in an ISA-register or main-memory word will cause a program to produce silent data corruption (SDC; i.e., an erroneous result). We also consider the estimation of a measure called SDC count, which represents the number of ISA-level bit flips that cause an SDC.
The five factors addressed are (a) the inputs processed by a program, (b) the level of compiler optimization, (c) the implementation of the program in the source code, (d) the fault model (single bit flips vs double bit flips) and (e)the fault-injection technique (inject-on-write vs inject-on-read). Our results show that these factors affect the error sensitivity in many ways; some factors strongly impact the error sensitivity or SDC count whereas others show a weaker impact. For example, our experiments show that single bit flips tend to cause SDCs more than double bit flips; compiler optimization positively impacts the SDC count but not necessarily the error sensitivity; the error sensitivity varies between 20% and 50% among the programs we tested; and variations in input affect the error sensitivity significantly for most of the tested programs.

silent data corruption

error sensitivity

fault injection

soft errors

EDIT 8103
Opponent: Associate professor Juan Carlos Ruiz, Technical University of Valencia, Spain

Author

Fatemeh Ayatolahi

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

A Study of the Impact of Single Bit-Flip and Double Bit- Flip Errors on Program Execution

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; (2013)

Paper in proceeding

A Comparison of Inject-on-Read and Inject-on-Write in ISA-Level Fault Injection

11TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE,; (2016)p. 178-189

Paper in proceeding

On the Impact of Hardware Faults – An Investigation of the Relationship between Workload Inputs and Failure Mode Distributions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; Vol. 7612(2012)p. 198-209

Paper in proceeding

A Study of the Impact of Bit-flip Errors on Programs Compiled with Different Optimization Levels

10th European Dependable Computing Conference, EDCC 2014; Newcastle upon Tyne; United Kingdom; 13 May 2014 through 16 May 2014;,; (2014)p. 146-157

Paper in proceeding

Back-to-Back Fault Injection Testing in Model-Based Development

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; Vol. 9337(2015)p. 135-148

Paper in proceeding

Subject Categories

Computer and Information Science

Computational Mathematics

Electrical Engineering, Electronic Engineering, Information Engineering

Computer Science

ISBN

978-91-7905-493-9

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4960

Publisher

Chalmers

EDIT 8103

Online

Opponent: Associate professor Juan Carlos Ruiz, Technical University of Valencia, Spain

More information

Latest update

11/13/2023