Efficient Data Streaming Analytic Designs for Parallel and Distributed Processing
Doctoral thesis, 2022

Today, ubiquitously sensing technologies enable inter-connection of physical objects, as part of Internet of Things (IoT), and provide massive amounts of data streams. In such scenarios, the demand for timely analysis has resulted in a shift of data processing paradigms towards continuous, parallel, and multitier computing. However, these paradigms are followed by several challenges especially regarding analysis speed, precision, costs, and deterministic execution. This thesis studies a number of such challenges to enable efficient continuous processing of streams of data in a decentralized and timely manner.

In the first part of the thesis, we investigate techniques aiming at speeding up the processing without a loss in precision. The focus is on continuous machine learning/data mining types of problems, appearing commonly in IoT applications, and in particular continuous clustering and monitoring, for which we present novel algorithms; (i) Lisco, a sequential algorithm to cluster data points collected by LiDAR (a distance sensor that creates a 3D mapping of the environment), (ii) p-Lisco, the parallel version of Lisco to enhance pipeline- and data-parallelism of the latter, (iii) pi-Lisco, the parallel and incremental version to reuse the information and prevent redundant computations, (iv) g-Lisco, a generalized version of Lisco to cluster any data with spatio-temporal locality by leveraging the implicit ordering of the data, and (v) Amble, a continuous monitoring solution in an industrial process.

In the second part, we investigate techniques to reduce the analysis costs in addition to speeding up the processing while also supporting deterministic execution. The focus is on problems associated with availability and utilization of computing resources, namely reducing the volumes of data, involving concurrent computing elements, and adjusting the level of concurrency. For that, we propose three frameworks; (i) DRIVEN, a framework to continuously compress the data and enable efficient transmission of the compact data in the processing pipeline, (ii) STRATUM, a framework to continuously pre-process the data before transferring the later to upper tiers for further processing, and (iii) STRETCH, a framework to enable instantaneous elastic reconfigurations to adjust intra-node resources at runtime while ensuring determinism.

The algorithms and frameworks presented in this thesis contribute to an efficient processing of data streams in an online manner while utilizing available resources. Using extensive evaluations, we show the efficiency and achievements of the proposed techniques for IoT representative applications that involve a wide spectrum of platforms, and illustrate that the performance of our work exceeds that of state-of-the-art techniques.

stream processing

continuous analysis

scalability

elasticity

HA2 Lecture Hall, Hörsalsvägen 4, Chalmers (Campus Johanneberg)
Opponent: Prof. Valeria Cardellini, University of Rome Tor Vergata, Italy

Author

Hannaneh Najdataei

Network and Systems

Continuous and parallel LiDAR point-cloud clustering

Proceedings - International Conference on Distributed Computing Systems,; Vol. 2018-July(2018)p. 671-684

Paper in proceeding

pi-Lisco: parallel and incremental stream-based point-cloud clustering

Proceedings of the ACM Symposium on Applied Computing,; (2022)p. 460-469

Paper in proceeding

DRIVEN: A framework for efficient Data Retrieval and clustering in Vehicular Networks

Future Generation Computer Systems,; Vol. 107(2020)p. 1-17

Journal article

Adaptive Stream-based Shifting Bottleneck Detection in IoT-based Computing Architectures

IEEE International Conference on Emerging Technologies and Factory Automation, ETFA,; Vol. 2019-September(2019)p. 993-1000

Paper in proceeding

STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream Processing

IEEE Transactions on Parallel and Distributed Systems,; Vol. 33(2022)p. 4221-4238

Journal article

We are living in an increasingly connected world where it is expected to see everything of material significance to be sensor tagged and connected to the Internet in order to report its state in real-time. These information-sensing Internet of Things (IoT) devices generate streams of data continuously (e.g. a modern vehicle is equipped with many sensors to collect data continuously from the surrounding), which require efficient analytics to uncover new insights and offer value. Such analytics are often desired to enable one pass continuous analysis on data in motion and quickly generate streams of results (e.g. in an obstacle avoidance scenario, the vehicle needs to continuously process the incoming data and take timely corresponding actions).

This thesis proposes analytical tools for efficient stream processing by focusing on some challenging aspects such as performance, precision, and utilization of available resources. For that, we investigate techniques aiming at speeding up the processing without a loss in precision (e.g. accurately detect objects surrounding the vehicle in a timely manner). Moreover, we provide frameworks that enable reducing the analysis costs and allow more optimization in addition to speeding up the processing (e.g. employ computing resources over the Internet to help processing vehicles’ data). The proposed novel techniques and frameworks contribute to an efficient processing of data streams in an online manner. We evaluate our work for real-world IoT representative applications and show the improvements and achievements of the proposed solutions compared to the state-of-the-art.

Future factories in the Cloud (FiC)

Swedish Foundation for Strategic Research (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.

Areas of Advance

Information and Communication Technology

Production

Energy

Subject Categories

Software Engineering

Computer Science

Computer Systems

ISBN

978-91-7905-706-0

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5172

Publisher

Chalmers

HA2 Lecture Hall, Hörsalsvägen 4, Chalmers (Campus Johanneberg)

Online

Opponent: Prof. Valeria Cardellini, University of Rome Tor Vergata, Italy

More information

Latest update

9/2/2022 1