Efficient Implementation of Concurrent Data Structures on Multi-core and Many-core Architectures

Bapi Chatterjee

Efficient Implementation of Concurrent Data Structures on Multi-core and Many-core Architectures
Licentiate thesis, 2015

Synchronization of concurrent threads is the central problem in order to design efficient concurrent data-structures. The compute systems widely available in market are increasingly becoming heterogeneous involving multi-core Central Processing Units (CPUs) and many-core Graphics Processing Units (GPUs). This thesis contributes to the research of efficient synchronization in concurrent data-structures in more than one way. It is divided into two parts. In the first part, a novel design of a Set Abstract Data Type (ADT) based on an efficient lock-free Binary Search Tree (BST) with improved amortized bounds of the time complexity of set operations - Add, Remove and Contains, is presented. In the second part, a comprehensive evaluation of concurrent Queue implementations on multi-core CPUs as well as many-core GPUs are presented. Efficient Lock-free BST -To the best of our knowledge, the lock-free BST presented in this thesis is the first to achieve an amortized complexity of O(H(n)+c) for all Set operations where H(n) is the height of a BST on n nodes and c is the contention measure. Also, the presented lock-free algorithm of BST comes with an improved disjoint-access-parallelism compared to the previously existing concurrent BST algorithms. This algorithm uses single-word compare-and-swap (CAS) primitives. The presented algorithm is linearizable. We implemented the algorithm in Java and it shows good scalability. Evaluation of concurrent data-structures - We have evaluated the performance of a number of concurrent FIFO Queue algorithms on multi-core CPUs and many-core GPUs. We studied the portability of existing design of concurrent Queues from CPUs to GPUs which are inherently designed for SIMD programs. We observed that in general concurrent queues offer them to efficient implementation on GPUs with faster cache memory and better performance support for atomic synchronization primitives such as CAS. To the best of our knowledge, this is the first attempt to evaluate a concurrent data-structure on GPUs.

Lock-free

Lock-free Binary search tree

Synchronization Primitives

Concurrent Data Structures

Room EA, EDIT

Opponent: Dr. Neeraj Mittal Department of Computer Science, The University of Texas at Dallas, USA

Author

Bapi Chatterjee

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Other publications Research

Understanding the Performance of Concurrent Data Structures on Graphics Processors

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 7484 LNCS(2012)p. 883-894

Paper in proceeding

Efficient lock-free binary search trees

2014 ACM Symposium on Principles of Distributed Computing, PODC 2014; Paris; France; 15 July 2014 through 18 July 2014,;(2014)p. 322-331

Paper in proceeding

Subject Categories (SSIF 2011)

Computer Engineering

Software Engineering

Information Science

Computer Science

Computer Systems

Areas of Advance

Information and Communication Technology

Publisher

Chalmers