Adaptable Hardware Transactional Memory Protocols
Licentiate thesis, 2011
Transactional Memory (TM) is an important programming paradigm that can help alleviate difficulties associated with concurrent programming. Single-threaded performance can no longer be expected to scale as it did in the past. Programmers, therefore, must seriously consider concurrent algorithms as viable alternatives. In this context, TM promises to provide safe, easy and intuitive constructs that simplify ways in which multiple threads can co-operate to solve a problem quickly in a shared-memory environment.
In this thesis I investigate ways to efficiently incorporate TM in current parallel architectures while keeping in mind the need to provide robust performance over a variety of transactional workloads that such architectures might be expected to run in the future. Software TM implementations impose a substantial performance penalty, thereby making the design of high-performance hardware TM (HTM) implementations necessary. A multitude of HTM design points exist and each has a sweet spot in terms of behavioral characteristics of transactional applications. Yet, no design performs the best across the entire spectrum of transactional workloads. This presents a strong case for systems that incorporate flexibility with modest changes to existing parallel architectures. Such systems would allow key TM policies to be changed to suit the needs of workloads and sidestep pathological conditions.
LV*, the first study included in this thesis, describes a novel approach to build flexibility into a broadcast-based chip-multiprocessor (CMP) in a simple manner and allow programmer control over key TM policies. The ZEBRA HTM design, described in the second study, proposes an HTM design for distributed directory based CMPs that adapts its behavior based of the characteristics of data accessed by transactions, providing significant gains in performance over existing designs. Finally, in the last included study, we show how write-buffers, that are so ubiquitous in modern microarchitectures, have the potential and the versatility to provide substantial performance gains, when used judiciously in HTM designs, through the avoidance of redundant actions at commit time and unnecessary cache line invalidations when aborting.