Efficient concurrent data structure access parallelism techniques for increasing scalability
Doctoral thesis, 2023

Multi-core processors have revolutionised the way data structures are designed by bringing parallelism to mainstream computing. Key to exploiting hardware parallelism available in multi-core processors are concurrent data structures. However, some concurrent data structure abstractions are inherently sequential and incapable of harnessing the parallelism performance of multi-core processors. Designing and implementing concurrent data structures to harness hardware parallelism is challenging due to the requirement of correctness, efficiency and practicability under various application constraints. In this thesis, our research contribution is towards improving concurrent data structure access parallelism to increase data structure performance. We propose new design frameworks that improve access parallelism of already existing concurrent data structure designs. Also, we propose new concurrent data structure designs with significant performance improvements. To give an insight into the interplay between hardware and concurrent data structure access parallelism, we give a detailed analysis and model the performance scalability with varying parallelism.

In the first part of the thesis, we focus on data structure semantic relaxation. By relaxing the semantics of a data structure, a bigger design space, that allows weaker synchronization and more useful parallelism, is unveiled. Investigating new data structure designs, capable of trading semantics for achieving better performance in a monotonic way, is a major challenge in the area. We algorithmically address this challenge in this part of the thesis. We present an efficient, lock-free, concurrent data structure design framework for out-of-order semantic relaxation. We introduce a new two-dimensional algorithmic design, that uses multiple instances of a given data structure to improve access parallelism.

In the second part of the thesis, we propose an efficient priority queue that improves access parallelism by reducing the number of synchronization points for each operation. Priority queues are fundamental abstract data types, often used to manage limited resources in parallel systems. Typical proposed parallel priority queue implementations are based on heaps or skip lists. In recent literature, skip lists have been shown to be the most efficient design choice for implementing priority queues. Though numerous intricate implementations of skip list based queues have been proposed in the literature, their performance is constrained by the high number of global atomic updates per operation and the high memory consumption, which are proportional to the number of sub-lists in the queue. In this part of the thesis, we propose an alternative approach for designing lock-free linearizable priority queues, that significantly improve memory efficiency and throughput performance, by reducing the number of global atomic updates and memory consumption as compared to skip-list based queues. To achieve this, our new design combines two structures; a search tree and a linked list, forming what we call a Tree Search List Queue (TSLQueue).

Subsequently, we analyse and introduce a model for lock-free concurrent data structure access parallelism. The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access points, leading to thread serialisation, and hindering parallelism. Aiming to address this challenge, a significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting. In this part of the thesis, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add.

FIFO queue

parallelism

performance modelling

design framework

semantic relaxation

Data structure

multi-access

lock free

performance analysis

counter

concurrency

search tree.

stack

multi-core processor

priority queue

Room HA2, Johanneberg campus, Chalmers (https://maps.chalmers.se/#0bb94e4a-61cf-45e6-a197-260258e605ce)
Opponent: Prof. Peter Sanders, Karlsruhe Institute of Technology, Germany

Author

Adones Rukundo

Network and Systems

Monotonically relaxing concurrent data-structure semantics for increasing performance: An efficient 2D design framework

Leibniz International Proceedings in Informatics, LIPIcs,; Vol. 146(2019)

Paper in proceeding

TSLQueue: An Efficient Lock-Free Design for Priority Queues

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),; Vol. 12820 LNCS(2021)p. 385-401

Paper in proceeding

Performance Analysis and Modelling of Concurrent Multi-access Data Structures

Annual ACM Symposium on Parallelism in Algorithms and Architectures,; Vol. SPA 22(2022)p. 333-344

Paper in proceeding

Concurrent computing refers to the process of breaking down larger jobs into smaller, often similar jobs that can be executed simultaneously by multiple processors (workers) communicating via shared resources. The primary goal of concurrent computing is to harness the available computation power of multiple processors for faster job processing.

Consider a job of loading a given number of boxes into a single truck at a warehouse using multiple workers. The challenge here is how to count the number of loaded boxes so that a given number is not exceeded. A basic way to solve this is to have a single list onto which the workers tally the number of boxes they have loaded. Each worker has to check the tally for the current number of boxes loaded before they can load a box onto the truck. If the tally is less than the required number, the worker increases the tally by one and loads another box onto the truck. Otherwise the workers stop loading and the truck sets off. With a single tally list, the loading process will be slow since the list can only be accessed by one worker at a time. This therefore means that increasing the number of workers might not increase (scale) the loading speed since they have to queue up (delay) to access the tally list.

To avoid queuing on a shared list, each worker can be assigned a specific number of boxes to load. This way, each worker tracks their own count without sharing a list. The loading process is complete once all the workers have loaded their specific number of boxes. 

In this case, the loading process duration will be determined by the slowest worker irrespective of how many fast workers are involved.

A more efficient way to solve this problem is to have multiple lists, each having a maximum tally thresh hold. Each worker can select a random list on which to tally before loading a box for as long as the list is below the given tally thresh hold. Otherwise if all the lists have reached the maximum tally threshold, the loading process is complete and the truck can set off. Here, the loading process doesn't have to be delayed by slow workers since there is not limit on individual workers. Faster workers can load more boxes than slow workers. At the same time, workers do not have to queue up on a single shared list. Similar to concurrent computing, the primary goal here is to efficiently harnessing the work force of multiple workers without losing count of the boxes being loaded.

Future factories in the Cloud (FiC)

Swedish Foundation for Strategic Research (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.

Sweden-East Africa University Network knowledge development for sustainable development

The Swedish Foundation for International Cooperation in Research and Higher Education (STINT) (SG2021-8934), 2022-01-08 -- 2024-12-31.

Areas of Advance

Information and Communication Technology

Building Futures (2010-2018)

Driving Forces

Sustainable development

Innovation and entrepreneurship

Subject Categories

Computer and Information Science

Computer Science

Roots

Basic sciences

ISBN

978-91-7905-837-1

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5303

Publisher

Chalmers

Room HA2, Johanneberg campus, Chalmers (https://maps.chalmers.se/#0bb94e4a-61cf-45e6-a197-260258e605ce)

Online

Opponent: Prof. Peter Sanders, Karlsruhe Institute of Technology, Germany

More information

Latest update

7/13/2023