Efficient Data Streaming Analytic Designs for Parallel and Distributed Processing
Doktorsavhandling, 2022
In the first part of the thesis, we investigate techniques aiming at speeding up the processing without a loss in precision. The focus is on continuous machine learning/data mining types of problems, appearing commonly in IoT applications, and in particular continuous clustering and monitoring, for which we present novel algorithms; (i) Lisco, a sequential algorithm to cluster data points collected by LiDAR (a distance sensor that creates a 3D mapping of the environment), (ii) p-Lisco, the parallel version of Lisco to enhance pipeline- and data-parallelism of the latter, (iii) pi-Lisco, the parallel and incremental version to reuse the information and prevent redundant computations, (iv) g-Lisco, a generalized version of Lisco to cluster any data with spatio-temporal locality by leveraging the implicit ordering of the data, and (v) Amble, a continuous monitoring solution in an industrial process.
In the second part, we investigate techniques to reduce the analysis costs in addition to speeding up the processing while also supporting deterministic execution. The focus is on problems associated with availability and utilization of computing resources, namely reducing the volumes of data, involving concurrent computing elements, and adjusting the level of concurrency. For that, we propose three frameworks; (i) DRIVEN, a framework to continuously compress the data and enable efficient transmission of the compact data in the processing pipeline, (ii) STRATUM, a framework to continuously pre-process the data before transferring the later to upper tiers for further processing, and (iii) STRETCH, a framework to enable instantaneous elastic reconfigurations to adjust intra-node resources at runtime while ensuring determinism.
The algorithms and frameworks presented in this thesis contribute to an efficient processing of data streams in an online manner while utilizing available resources. Using extensive evaluations, we show the efficiency and achievements of the proposed techniques for IoT representative applications that involve a wide spectrum of platforms, and illustrate that the performance of our work exceeds that of state-of-the-art techniques.
stream processing
continuous analysis
scalability
elasticity
Författare
Hannaneh Najdataei
Nätverk och System
Continuous and parallel LiDAR point-cloud clustering
Proceedings - International Conference on Distributed Computing Systems,;Vol. 2018-July(2018)p. 671-684
Paper i proceeding
pi-Lisco: parallel and incremental stream-based point-cloud clustering
Proceedings of the ACM Symposium on Applied Computing,;(2022)p. 460-469
Paper i proceeding
DRIVEN: A framework for efficient Data Retrieval and clustering in Vehicular Networks
Future Generation Computer Systems,;Vol. 107(2020)p. 1-17
Artikel i vetenskaplig tidskrift
Adaptive Stream-based Shifting Bottleneck Detection in IoT-based Computing Architectures
IEEE International Conference on Emerging Technologies and Factory Automation, ETFA,;Vol. 2019-September(2019)p. 993-1000
Paper i proceeding
STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream Processing
IEEE Transactions on Parallel and Distributed Systems,;Vol. 33(2022)p. 4221-4238
Artikel i vetenskaplig tidskrift
This thesis proposes analytical tools for efficient stream processing by focusing on some challenging aspects such as performance, precision, and utilization of available resources. For that, we investigate techniques aiming at speeding up the processing without a loss in precision (e.g. accurately detect objects surrounding the vehicle in a timely manner). Moreover, we provide frameworks that enable reducing the analysis costs and allow more optimization in addition to speeding up the processing (e.g. employ computing resources over the Internet to help processing vehicles’ data). The proposed novel techniques and frameworks contribute to an efficient processing of data streams in an online manner. We evaluate our work for real-world IoT representative applications and show the improvements and achievements of the proposed solutions compared to the state-of-the-art.
Molnbaserade produkter och produktion (FiC)
Stiftelsen för Strategisk forskning (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.
Styrkeområden
Informations- och kommunikationsteknik
Produktion
Energi
Ämneskategorier
Programvaruteknik
Datavetenskap (datalogi)
Datorsystem
ISBN
978-91-7905-706-0
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5172
Utgivare
Chalmers
HA2 Lecture Hall, Hörsalsvägen 4, Chalmers (Campus Johanneberg)
Opponent: Prof. Valeria Cardellini, University of Rome Tor Vergata, Italy