Compiler optimizations in the presence of uncertain semantics
As transistors sizes shrink and architects put more and more cores on chip, computer systems become more susceptible to outside interference. This interference can cause faults that manifest as unexpected and uncontrolled state transitions, possibly leading to costly or harmful consequences. Coping with faults requires introducing some form of redundancy. On the other hand, redundancy is often removed in improving the performance of an application. Compiler optimizations are used to exploit computational capabilities to their full extent. Previously, compiler optimizations often targeted performance, but recently optimizations are being introduced to trade performance for reliability. This thesis presents optimizations that try to improve both performance and reliability, prioritizing one of the other based on need.
First, I introduce a framework for semi-automatically adding software fault-tolerance to an application. Using this framework I show that together with an appropriate voting mechanism, redundant executions can increase reliability while keeping performance overhead as low as 18%.
Second, I present an algorithm based on geometric programming for minimizing the number of redundant executions in an application while maintaining a reliability threshold. Often a static number of redundant executions per statement is employed throughout the whole application. To minimize performance overhead I exploit that some operations are naturally more reliable, and more costly, than others.
Third, I introduce a voting system that adds redundant executions as needed, up to a number decided by our optimization method. Using this scheme, I show improvement in performance and reliability over a scheme where a fixed number of redundant executions is used.
Finally, I present an analysis based on abstract interpretation to determine the impact of a finite number of faults. An analysis based on abstract interpretation guarantees logical soundness by construction, and we evaluate its applicability on kernels and corner cases. This analysis could be used to decide where redundant executions are not needed.