On Design and Applications of Practical Concurrent Data Structures

Ivan Walulya

On Design and Applications of Practical Concurrent Data Structures
Doktorsavhandling, 2018

The proliferation of multicore processors is having an enormous impact on software design and development. In order to exploit parallelism available in multicores, there is a need to design and implement abstractions that programmers can use for general purpose applications development. A common abstraction for coordinated access to memory is a concurrent data structure. Concurrent data structures are challenging to design and implement as they are required to be correct, scalable, and practical under various application constraints. In this thesis, we contribute to the design of efficient concurrent data structures, propose new design techniques and improvements to existing implementations. Additionally, we explore the utilization of concurrent data structures in demanding application contexts such as data stream processing.

In the first part of the thesis, we focus on data structures that are difficult to parallelize due to inherent sequential bottlenecks. We present a lock-free vector design that efficiently addresses synchronization bottlenecks by utilizing the combining technique. Typical combining techniques are blocking. Our design introduces combining without sacrificing non-blocking progress guarantees. We extend the vector to present a concurrent lock-free unbounded binary heap that implements a priority queue with mutable priorities.

In the second part of the thesis, we shift our focus to concurrent search data structures. In order to offer strong progress guarantee, typical implementations of non-blocking search data structures employ a "helping" mechanism. However, helping may result in performance degradation. We propose help-optimality, which expresses optimization in amortized step complexity of concurrent operations. To describe the concept, we revisit the lock-free designs of a linked-list and a binary search tree and present improved algorithms. We design the algorithms without using any language/platform specific constructs; we do not use bit-stealing or runtime type introspection of objects. Thus, our algorithms are portable. We further delve into multi-dimensional data and similarity search. We present the first lock-free multi-dimensional data structure and linearizable nearest neighbor search algorithm. Our algorithm for nearest neighbor search is generic and can be adapted to other data structures.

In the last part of the thesis, we explore the utilization of concurrent data structures for deterministic stream processing. We propose solutions to two challenges prevalent in data stream processing: (1) efficient processing on cloud as well as edge devices and (2) deterministic data-parallel processing at high-throughput and low-latency. As a first step, we present a methodology for customization of streaming aggregation on low-power multicore embedded platforms. Then we introduce Viper, a communication module that can be integrated into stream processing engines for the coordination of threads analyzing data in parallel.

combining

multicore

stream processing

atomicity

concurrent data structures

non-blocking

synchronization

lock-free

locking

ED, EDIT, Hörsalsvägen 11, Chalmers.

Opponent: Assoc. Prof. Danny Hendler, Department of Computer Science, Ben-Gurion University, Israel

Författare

Ivan Walulya

Chalmers, Data- och informationsteknik, Nätverk och system

Forskning Andra publikationer

Scalable Lock-Free Vector with Combining

Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017,;(2017)p. 917-926

Paper i proceeding

Ivan Walulya, Bapi Chatterjee, Ajoy K. Datta, Rashmi Niyoliya, and Philippas Tsigas. Concurrent lock-free unbounded priority queue with mutable priorities

Help-Optimal and Language-Portable Lock-Free Concurrent Data Structures

45th International Conference on Parallel Processing (ICPP), 2016,;Vol. 2016 september(2016)p. 360-369

Paper i proceeding

Concurrent linearizable nearest neighbour search in lockfree-kd-Tree

ACM International Conference Proceeding Series,;Vol. Part F133180(2018)

Paper i proceeding

Customization methodology for implementation of streaming aggregation in embedded systems

Journal of Systems Architecture,;Vol. 66-67(2016)p. 48-60

Artikel i vetenskaplig tidskrift

Viper: A module for communication-layer determinism and scaling in low-latency stream processing

Future Generation Computer Systems,;Vol. 88(2018)p. 297-308

Artikel i vetenskaplig tidskrift

In recent years, multicore systems have become ubiquitous; processors in ultra-low power embedded systems to supercomputers contain multiple cores. This proliferation of multicore processors is having an enormous impact on how we design and implement software systems. Shared-memory multicore processors are systems on which multiple computation threads can execute concurrently with access to shared system resources. However, the access to shared resources needs to be synchronized; which is generally the cause of significant difficulty with utilizing multicore systems.

In order to efficiently utilize these multicore processors, we need to design and implement concurrent programming abstractions that programmers at all levels of expertise can trivially use for general-purpose applications development. A common abstraction for synchronized access to shared data is a concurrent data structure. Concurrent data structures are challenging to design and implement due to the requirement to be correct, efficient and practical under various application constraints.

In this thesis, we propose new techniques for designing efficient concurrent data structures and improvements to existing implementations. We explore design approaches that are easy to implement without concern for the programming language or deployment platform. Additionally, we explore how to utilize concurrent data structures in complex applications, especially those with stringent throughput and latency demands such as data stream processing.

Execution Models for Energy-Efficient Computing Systems (EXCESS)

Europeiska kommissionen (EU) (EC/FP7/611183), 2013-09-01 -- 2016-08-31.

Visa projekt

Ämneskategorier (SSIF 2011)

Datorteknik

Datavetenskap (datalogi)

Datorsystem

ISBN

978-91-7597-815-4

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4496

Utgivare

Chalmers

ED, EDIT, Hörsalsvägen 11, Chalmers.

Opponent: Assoc. Prof. Danny Hendler, Department of Computer Science, Ben-Gurion University, Israel

Mer information

Senast uppdaterat

2018-10-29

On Design and Applications of Practical Concurrent Data Structures Doktorsavhandling, 2018

Författare

Ivan Walulya

Scalable Lock-Free Vector with Combining

Ivan Walulya, Bapi Chatterjee, Ajoy K. Datta, Rashmi Niyoliya, and Philippas Tsigas. Concurrent lock-free unbounded priority queue with mutable priorities

Help-Optimal and Language-Portable Lock-Free Concurrent Data Structures

Concurrent linearizable nearest neighbour search in lockfree-kd-Tree

Customization methodology for implementation of streaming aggregation in embedded systems

Viper: A module for communication-layer determinism and scaling in low-latency stream processing

Execution Models for Energy-Efficient Computing Systems (EXCESS)

Ämneskategorier (SSIF 2011)

ISBN

Utgivare

Mer information

Senast uppdaterat

On Design and Applications of Practical Concurrent Data Structures
Doktorsavhandling, 2018