Towards Accurate Estimation of Error Sensitivity in Computer Systems
Doktorsavhandling, 2021

Fault injection is an increasingly important method for assessing, measuringand observing the system-level impact of hardware and software faults in computer systems. This thesis presents the results of a series of experimental studies in which fault injection was used to investigate the impact of bit-flip errors on program execution. The studies were motivated by the fact that transient hardware faults in microprocessors can cause bit-flip errors that can propagate to the microprocessors instruction set architecture registers and main memory. As the rate of such hardware faults is expected to increase with technology scaling, there is a need to better understand how these errors (known as ‘soft errors’) influence program execution, especially in safety-critical systems.
Using ISA-level fault injection, we investigate how five aspects, or factors, influence the error sensitivity of a program. We define error sensitivity as the conditional probability that a bit-flip error in live data in an ISA-register or main-memory word will cause a program to produce silent data corruption (SDC; i.e., an erroneous result). We also consider the estimation of a measure called SDC count, which represents the number of ISA-level bit flips that cause an SDC.
The five factors addressed are (a) the inputs processed by a program, (b) the level of compiler optimization, (c) the implementation of the program in the source code, (d) the fault model (single bit flips vs double bit flips) and (e)the fault-injection technique (inject-on-write vs inject-on-read). Our results show that these factors affect the error sensitivity in many ways; some factors strongly impact the error sensitivity or SDC count whereas others show a weaker impact. For example, our experiments show that single bit flips tend to cause SDCs more than double bit flips; compiler optimization positively impacts the SDC count but not necessarily the error sensitivity; the error sensitivity varies between 20% and 50% among the programs we tested; and variations in input affect the error sensitivity significantly for most of the tested programs.

soft errors

silent data corruption

error sensitivity

fault injection

EDIT 8103
Opponent: Associate professor Juan Carlos Ruiz, Technical University of Valencia, Spain

Författare

Fatemeh Ayatolahi

Chalmers, Data- och informationsteknik, Datorteknik

A Study of the Impact of Single Bit-Flip and Double Bit- Flip Errors on Program Execution

Computer Safety, Reliability, and Security. SAFECOMP, September 24-27.,; (2013)

Paper i proceeding

A Comparison of Inject-on-Read and Inject-on-Write in ISA-Level Fault Injection

11TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE,; (2016)p. 178-189

Paper i proceeding

On the Impact of Hardware Faults – An Investigation of the Relationship between Workload Inputs and Failure Mode Distributions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; Vol. 7612(2012)p. 198-209

Paper i proceeding

A Study of the Impact of Bit-flip Errors on Programs Compiled with Different Optimization Levels

10th European Dependable Computing Conference, EDCC 2014; Newcastle upon Tyne; United Kingdom; 13 May 2014 through 16 May 2014;,; (2014)p. 146-157

Paper i proceeding

Fatemeh Ayatolahi, Johan Karlsson. “Statistical Analysis of Fault-Injection Data — A Case Study using Hypothesis Testing”

Ämneskategorier

Data- och informationsvetenskap

Elektroteknik och elektronik

ISBN

978-91-7905-493-9

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4960

Utgivare

Chalmers tekniska högskola

EDIT 8103

Online

Opponent: Associate professor Juan Carlos Ruiz, Technical University of Valencia, Spain

Mer information

Senast uppdaterat

2021-06-09