Techniques to Reduce Inefficiencies in Hardware Transactional Memory Systems
Doktorsavhandling, 2011

The recent trend of multicore CPUs pushes for major changes in software development. Traditional single-threaded applications can no longer get a sustainable performance boost from this new generation of CPUs that consist of multiple processors (cores). Applications must be programmed in a parallel fashion to take advantage of their performance potential. Traditional lock-based parallel programming models are considered to be too difficult and error prone for average programmers. A program can be trapped in deadlock and livelock by unconscious locking of shared resources. Furthermore, this style of coordination uses blocking synchronization where execution of e.g. a critical section is exclusive. This may potentially cause serialization that would not be needed in case there are no data races. Transactional memory has been proposed to simplify parallel programming and increase concurrency by using non-blocking synchronization. In transactional memory systems, multiple transactions (e.g., critical section invocations) from different threads can be executed speculatively in parallel. Data integrity, hence the program correctness, is maintained by isolating the speculative execution and committing the end result atomically. Data sharing conflicts between two transactions restrict only one of them to commit successfully. This thesis deals with transactional memory that is implemented in hardware (HTM). In this thesis, several inefficiencies of HTM systems that hurt performance are discovered and novel solutions to the problems are proposed. In an HTM system that detects conflicts lazily, transactions from one thread can repeatedly squash a transaction from another thread which can lead to a starvation problem for the latter. A novel solution that uses squash counts for individual transactions is proposed to avoid starvation. At a data conflict, HTM systems squash the speculative executions and re-execute transactions from the beginning without considering the fact that the entire execution is not unsafe. A scheme is proposed that smartly takes intermediate checkpoints so that the safe part of the execution is not squashed. To isolate the speculative execution, a private buffer is used to store the speculative data. The drastic effect of speculative buffer overflow is discovered and a scheme is proposed that decouples the read set from the speculative buffer to reduce overflows. To adapt conflict resolution to the application behavior a flexible HTM infrastructure is proposed. To better understand the root causes of HTM inefficiencies conflicts are quantified in different classes and techniques are introduced to reduce these conflict classes.

intermediate checkpoint

conflicting address prediction

Bloom filter

conflict resolution


speculative buffer overflow

parallel programming

5C model for cache-misses

transactional memory

conflict classification


Sal EE
Opponent: Professor Ian Watson


Mridha Mohammad Waliullah

Chalmers, Data- och informationsteknik, Datorteknik

Efficient Partial Roll-backing Mechanism for Transactional Memory Systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; Vol. 6590(2008)p. 256-274

Artikel i vetenskaplig tidskrift

Efficient Management of Speculative Data in Hardware Transactional Memory Systems

2008 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, IC-SAMOS 2008; Samos; Greece; 21 July 2008 through 24 July 2008,; (2008)p. 158-164

Paper i proceeding

Simple Performance Optimization Techniques for Hardware Transactional Memory Systems

Proceedings of the Third Swedish Workshop on Multicore Computing,; (2010)

Övrigt konferensbidrag

Schemes for avoiding starvation in transactional memory systems

Concurrency Computation Practice and Experience,; Vol. 21(2009)p. 859-873

Artikel i vetenskaplig tidskrift

LV*: A Low Complexity Lazy Versioning HTM Infrastructure

Proceedings - 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, IC-SAMOS 2010,; (2010)p. 231-240

Paper i proceeding




Informations- och kommunikationsteknik



Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 3167

Technical report D - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University: 75

Sal EE

Opponent: Professor Ian Watson

Mer information