Techniques to Reduce Inefficiencies in Hardware Transactional Memory Systems
The recent trend of multicore CPUs pushes for major changes in software development. Traditional single-threaded applications can no longer get a sustainable performance boost from this new generation of CPUs that consist of multiple processors (cores). Applications must be programmed in a parallel fashion to take advantage of their performance potential. Traditional lock-based parallel programming models are considered to be too difficult and error prone for average programmers. A program can be trapped in deadlock and livelock by unconscious locking of shared resources. Furthermore, this style of coordination uses blocking synchronization where execution of e.g. a critical section is exclusive. This may potentially cause serialization that would not be needed in case there are no data races. Transactional memory has been proposed to simplify parallel programming and increase concurrency by using non-blocking synchronization. In transactional memory systems, multiple transactions (e.g., critical section invocations) from different threads can be executed speculatively in parallel. Data integrity, hence the program correctness, is maintained by isolating the speculative execution and committing the end result atomically. Data sharing conflicts between two transactions restrict only one of them to commit successfully. This thesis deals with transactional memory that is implemented in hardware (HTM).
In this thesis, several inefficiencies of HTM systems that hurt performance are discovered and novel solutions to the problems are proposed. In an HTM system that detects conflicts lazily, transactions from one thread can repeatedly squash a transaction from another thread which can lead to a starvation problem for the latter. A novel solution that uses squash counts for individual transactions is proposed to avoid starvation. At a data conflict, HTM systems squash the speculative executions and re-execute transactions from the beginning without considering the fact that the entire execution is not unsafe. A scheme is proposed that smartly takes intermediate checkpoints so that the safe part of the execution is not squashed. To isolate the speculative execution, a private buffer is used to store the speculative data. The drastic effect of speculative buffer overflow is discovered and a scheme is proposed that decouples the read set from the speculative buffer to reduce overflows. To adapt conflict resolution to the application behavior a flexible HTM infrastructure is proposed. To better understand the root causes of HTM inefficiencies conflicts are quantified in different classes and techniques are introduced to reduce these conflict classes.
conflicting address prediction
speculative buffer overflow
5C model for cache-misses