Source-Code Analysis for Fault Tolerance
Licentiatavhandling, 2009
Constructing a dependable and fault-tolerant system is inherently difficult. Not only should the system work under normal circumstances, but must also continue operation and provide potentially degraded service under hostile circumstances. Furthermore, since error handling and other fault-tolerance mechanisms are only executed in the presence of a fault, their quality assessment is difficult and often poor.
In this thesis, we turn to source-code analysis as a means of automatically assessing the quality of certain fault-tolerance mechanisms. In particular, we employ data-flow analysis to determine the behavior of applications in the presence of two widely different classes of errors.
At one end of the spectrum, there are software exceptions---a software construct that allow programmers to annotate code with error conditions in a high-level way. We implement a prototype analysis to detect C++ procedures that might cause undesired effects during error handling.
At the other end of the spectrum, in stark contrast to the controlled and structured behavior of software exceptions, are temporary hardware faults, or soft errors, caused by high-energy neutron radiation. As hardware components become smaller and smaller, they also become more and more vulnerable to this kind of temporary failures. Exactly how these low-level errors impact high-level applications is unclear, and we use source-code analysis to visualize their
impact.