Efficient Data Streaming Analytic Designs for Parallel and Distributed Processing
Doctoral thesis, 2022
In the first part of the thesis, we investigate techniques aiming at speeding up the processing without a loss in precision. The focus is on continuous machine learning/data mining types of problems, appearing commonly in IoT applications, and in particular continuous clustering and monitoring, for which we present novel algorithms; (i) Lisco, a sequential algorithm to cluster data points collected by LiDAR (a distance sensor that creates a 3D mapping of the environment), (ii) p-Lisco, the parallel version of Lisco to enhance pipeline- and data-parallelism of the latter, (iii) pi-Lisco, the parallel and incremental version to reuse the information and prevent redundant computations, (iv) g-Lisco, a generalized version of Lisco to cluster any data with spatio-temporal locality by leveraging the implicit ordering of the data, and (v) Amble, a continuous monitoring solution in an industrial process.
In the second part, we investigate techniques to reduce the analysis costs in addition to speeding up the processing while also supporting deterministic execution. The focus is on problems associated with availability and utilization of computing resources, namely reducing the volumes of data, involving concurrent computing elements, and adjusting the level of concurrency. For that, we propose three frameworks; (i) DRIVEN, a framework to continuously compress the data and enable efficient transmission of the compact data in the processing pipeline, (ii) STRATUM, a framework to continuously pre-process the data before transferring the later to upper tiers for further processing, and (iii) STRETCH, a framework to enable instantaneous elastic reconfigurations to adjust intra-node resources at runtime while ensuring determinism.
The algorithms and frameworks presented in this thesis contribute to an efficient processing of data streams in an online manner while utilizing available resources. Using extensive evaluations, we show the efficiency and achievements of the proposed techniques for IoT representative applications that involve a wide spectrum of platforms, and illustrate that the performance of our work exceeds that of state-of-the-art techniques.
stream processing
continuous analysis
scalability
elasticity
Author
Hannaneh Najdataei
Network and Systems
Continuous and parallel LiDAR point-cloud clustering
Proceedings - International Conference on Distributed Computing Systems,;Vol. 2018-July(2018)p. 671-684
Paper in proceeding
pi-Lisco: parallel and incremental stream-based point-cloud clustering
Proceedings of the ACM Symposium on Applied Computing,;(2022)p. 460-469
Paper in proceeding
DRIVEN: A framework for efficient Data Retrieval and clustering in Vehicular Networks
Future Generation Computer Systems,;Vol. 107(2020)p. 1-17
Journal article
Adaptive Stream-based Shifting Bottleneck Detection in IoT-based Computing Architectures
IEEE International Conference on Emerging Technologies and Factory Automation, ETFA,;Vol. 2019-September(2019)p. 993-1000
Paper in proceeding
STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream Processing
IEEE Transactions on Parallel and Distributed Systems,;Vol. 33(2022)p. 4221-4238
Journal article
This thesis proposes analytical tools for efficient stream processing by focusing on some challenging aspects such as performance, precision, and utilization of available resources. For that, we investigate techniques aiming at speeding up the processing without a loss in precision (e.g. accurately detect objects surrounding the vehicle in a timely manner). Moreover, we provide frameworks that enable reducing the analysis costs and allow more optimization in addition to speeding up the processing (e.g. employ computing resources over the Internet to help processing vehicles’ data). The proposed novel techniques and frameworks contribute to an efficient processing of data streams in an online manner. We evaluate our work for real-world IoT representative applications and show the improvements and achievements of the proposed solutions compared to the state-of-the-art.
Future factories in the Cloud (FiC)
Swedish Foundation for Strategic Research (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.
Areas of Advance
Information and Communication Technology
Production
Energy
Subject Categories
Software Engineering
Computer Science
Computer Systems
ISBN
978-91-7905-706-0
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5172
Publisher
Chalmers
HA2 Lecture Hall, Hörsalsvägen 4, Chalmers (Campus Johanneberg)
Opponent: Prof. Valeria Cardellini, University of Rome Tor Vergata, Italy