Hardware-Aware Algorithm Designs for Efficient Parallel and Distributed Processing

Charalampos Stylianopoulos

Hardware-Aware Algorithm Designs for Efficient Parallel and Distributed Processing
Doctoral thesis, 2020

The introduction and widespread adoption of the Internet of Things, together with emerging new industrial applications, bring new requirements in data processing. Specifically, the need for timely processing of data that arrives at high rates creates a challenge for the traditional cloud computing paradigm, where data collected at various sources is sent to the cloud for processing. As an approach to this challenge, processing algorithms and infrastructure are distributed from the cloud to multiple tiers of computing, closer to the sources of data. This creates a wide range of devices for algorithms to be deployed on and software designs to adapt to.

In this thesis, we investigate how hardware-aware algorithm designs on a variety of platforms lead to algorithm implementations that efficiently utilize the underlying resources. We design, implement and evaluate new techniques for representative applications that involve the whole spectrum of devices, from resource-constrained sensors in the field, to highly parallel servers. At each tier of processing capability, we identify key architectural features that are relevant for applications and propose designs that make use of these features to achieve high-rate, timely and energy-efficient processing.

In the first part of the thesis, we focus on high-end servers and utilize two main approaches to achieve high throughput processing: vectorization and thread parallelism. We employ vectorization for the case of pattern matching algorithms used in security applications. We show that re-thinking the design of algorithms to better utilize the resources available in the platforms they are deployed on, such as vector processing units, can bring significant speedups in processing throughout. We then show how thread-aware data distribution and proper inter-thread synchronization allow scalability, especially for the problem of high-rate network traffic monitoring. We design a parallelization scheme for sketch-based algorithms that summarize traffic information, which allows them to handle incoming data at high rates and be able to answer queries on that data efficiently, without overheads.

In the second part of the thesis, we target the intermediate tier of computing devices and focus on the typical examples of hardware that is found there. We show how single-board computers with embedded accelerators can be used to handle the computationally heavy part of applications and showcase it specifically for pattern matching for security-related processing. We further identify key hardware features that affect the performance of pattern matching algorithms on such devices, present a co-evaluation framework to compare algorithms, and design a new algorithm that efficiently utilizes the hardware features.

In the last part of the thesis, we shift the focus to the low-power, resource-constrained tier of processing devices. We target wireless sensor networks and study distributed data processing algorithms where the processing happens on the same devices that generate the data. Specifically, we focus on a continuous monitoring algorithm (geometric monitoring) that aims to minimize communication between nodes. By deploying that algorithm in action, under realistic environments, we demonstrate that the interplay between the network protocol and the application plays an important role in this layer of devices. Based on that observation, we co-design a continuous monitoring application with a modern network stack and augment it further with an in-network aggregation technique. In this way, we show that awareness of the underlying network stack is important to realize the full potential of the continuous monitoring algorithm.

The techniques and solutions presented in this thesis contribute to better utilization of hardware characteristics, across a wide spectrum of platforms. We employ these techniques on problems that are representative examples of current and upcoming applications and contribute with an outlook of emerging possibilities that can build on the results of the thesis.

resource-constrained

distributed processing

high-end

parallelism

hardware-aware

intermediate

online & Room 8103, EDIT Building, Hörsalsvägen 11

Opponent: Angelos Bilas, Department of Computer Science, University of Crete & FORTH, Greece

Online defence

Author

Charalampos Stylianopoulos

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Other publications Research

Industry Paper: On the Performance of Commodity Hardware for Low Latency and Low Jitter Packet Processing

DEBS 2020 - Proceedings of the 14th ACM International Conference on Distributed and Event-Based Systems,;(2020)p. 177-182

Paper in proceeding

Delegation sketch: A parallel design with support for fast and accurate concurrent operations

Proceedings of the 15th European Conference on Computer Systems, EuroSys 2020,;(2020)

Paper in proceeding

Multiple pattern matching for network security applications: Acceleration through vectorization (pre-print version)

Journal of Parallel and Distributed Computing,;Vol. 137(2020)p. 34-52

Journal article

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs

ACM International Conference Proceeding Series,;Vol. 2019-January(2019)p. 17-27

Paper in proceeding

Geometric Monitoring in Action: a Systems Perspective for the Internet of Things

Proceedings - Conference on Local Computer Networks, LCN,;Vol. 2018-October(2018)p. 433-436

Paper in proceeding

Continuous Monitoring meets Synchronous Transmissions and In-Network Aggregation

Proceedings - 15th Annual International Conference on Distributed Computing in Sensor Systems, DCOSS 2019,;(2019)p. 157-166

Paper in proceeding

CLort: High Throughput and Low Energy Network Intrusion Detection on IoT Devices with Embedded GPUs

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 11252 LNCS(2018)p. 187-202

Paper in proceeding

Multiple Pattern Matching for Network Security Applications: Acceleration through Vectorization

Proceedings of the International Conference on Parallel Processing,;(2017)p. 472-482

Paper in proceeding

In recent years, emerging application domains, such as the Internet of Things and industrial automation, have brought new challenges in data processing. The traditional cloud computing paradigm, where data collected at various sources is sent to the cloud for processing, is not always enough to handle the high rates at which data is generated, nor the timely processing required to extract value from that data.

As an approach to this challenge, data processing that was performed exclusively on the cloud is now distributed to multiple tiers of computing, closer to the sources of data. Processing now takes place on a wide range of devices, from high-end, powerful servers to small devices with limited resources. However, the wide differences in hardware characteristics found in these devices make it challenging to design processing algorithms that perform efficiently.

In this thesis, we investigate how hardware-aware algorithm designs on a variety of platforms lead to algorithm implementations that efficiently utilize the underlying resources. We design, implement and evaluate new techniques for representative applications that involve the whole spectrum of devices, from highly parallel servers to resource-constrained sensors. At each tier of processing capability, we identify key architectural features that are relevant for applications and propose designs that make use of these features to achieve high-rate, timely and energy-efficient processing.

RIOT: Resilient Internet of Things

Swedish Civil Contingencies Agency (MSB2018-12526), 2019-01-01 -- 2023-12-31.

Show Project

Integrated cyber-physical solutions for intelligent distribution grid with high penetration of renewables (UNITED-GRID)

European Commission (EC) (EC/H2020/773717), 2017-11-01 -- 2020-04-30.

Show Project

Resilient Information and Control Systems (RICS)

Swedish Civil Contingencies Agency (2015-828), 2015-09-01 -- 2020-08-31.

Show Project

INDEED

Chalmers, 2016-01-01 -- 2020-12-31.

Show Project

Subject Categories (SSIF 2011)

Computer Engineering

Computer Science

ISBN

978-91-7905-360-4

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4827

Publisher

Chalmers

online & Room 8103, EDIT Building, Hörsalsvägen 11

Online

Opponent: Angelos Bilas, Department of Computer Science, University of Crete & FORTH, Greece

More information

Latest update

2/25/2022

Hardware-Aware Algorithm Designs for Efficient Parallel and Distributed Processing Doctoral thesis, 2020

Author

Charalampos Stylianopoulos

Industry Paper: On the Performance of Commodity Hardware for Low Latency and Low Jitter Packet Processing

Delegation sketch: A parallel design with support for fast and accurate concurrent operations

Multiple pattern matching for network security applications: Acceleration through vectorization (pre-print version)

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs

Geometric Monitoring in Action: a Systems Perspective for the Internet of Things

Continuous Monitoring meets Synchronous Transmissions and In-Network Aggregation

CLort: High Throughput and Low Energy Network Intrusion Detection on IoT Devices with Embedded GPUs

Multiple Pattern Matching for Network Security Applications: Acceleration through Vectorization

RIOT: Resilient Internet of Things

Integrated cyber-physical solutions for intelligent distribution grid with high penetration of renewables (UNITED-GRID)

Resilient Information and Control Systems (RICS)

INDEED

Subject Categories (SSIF 2011)

ISBN

Publisher

More information

Latest update

Hardware-Aware Algorithm Designs for Efficient Parallel and Distributed Processing
Doctoral thesis, 2020