Towards Accurate Estimation of Error Sensitivity in Computer Systems
Doctoral thesis, 2021
Using ISA-level fault injection, we investigate how five aspects, or factors, influence the error sensitivity of a program. We define error sensitivity as the conditional probability that a bit-flip error in live data in an ISA-register or main-memory word will cause a program to produce silent data corruption (SDC; i.e., an erroneous result). We also consider the estimation of a measure called SDC count, which represents the number of ISA-level bit flips that cause an SDC.
The five factors addressed are (a) the inputs processed by a program, (b) the level of compiler optimization, (c) the implementation of the program in the source code, (d) the fault model (single bit flips vs double bit flips) and (e)the fault-injection technique (inject-on-write vs inject-on-read). Our results show that these factors affect the error sensitivity in many ways; some factors strongly impact the error sensitivity or SDC count whereas others show a weaker impact. For example, our experiments show that single bit flips tend to cause SDCs more than double bit flips; compiler optimization positively impacts the SDC count but not necessarily the error sensitivity; the error sensitivity varies between 20% and 50% among the programs we tested; and variations in input affect the error sensitivity significantly for most of the tested programs.
silent data corruption
error sensitivity
fault injection
soft errors
Author
Fatemeh Ayatolahi
Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)
A Study of the Impact of Single Bit-Flip and Double Bit- Flip Errors on Program Execution
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 8153 LNCS(2013)
Paper in proceeding
A Comparison of Inject-on-Read and Inject-on-Write in ISA-Level Fault Injection
11TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE,;(2016)p. 178-189
Paper in proceeding
On the Impact of Hardware Faults – An Investigation of the Relationship between Workload Inputs and Failure Mode Distributions
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 7612(2012)p. 198-209
Paper in proceeding
A Study of the Impact of Bit-flip Errors on Programs Compiled with Different Optimization Levels
10th European Dependable Computing Conference, EDCC 2014; Newcastle upon Tyne; United Kingdom; 13 May 2014 through 16 May 2014;,;(2014)p. 146-157
Paper in proceeding
Back-to-Back Fault Injection Testing in Model-Based Development
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 9337(2015)p. 135-148
Paper in proceeding
Subject Categories
Computer and Information Science
Computational Mathematics
Electrical Engineering, Electronic Engineering, Information Engineering
Computer Science
ISBN
978-91-7905-493-9
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4960
Publisher
Chalmers
EDIT 8103
Opponent: Associate professor Juan Carlos Ruiz, Technical University of Valencia, Spain