Parallel Data Streaming Analytics in the Context of Internet of Things
Licentiate thesis, 2019
In this work, we contribute to the design of computational infrastructures and development of tools to address these challenges. In this regard, we focus on two problem domains. First, we target continuous data analysis and particularly focus on data clustering, as a significant representative problem, to extract information from massive data, generated by high-rate sensors. We propose Lisco, a single-pass continuous Euclidean distance-based clustering which exploits the inherent ordering of the spatial and temporal data, and its parallel counterpart, P-Lisco, to enhance pipeline- and data-parallelism. These algorithms provide high throughput of results with low latency, through pushing the processing closer to the data sources. Moreover we provide a framework, DRIVEN, that performs a continuous bounded error approximation to compress the volumes of data, and then transmits the compressed data to next layers of the IoT architecture to perform clustering on it, in a continuous fashion, using generalized form of Lisco. The compression in data retrieval speeds up the transmission of the data while preserving very similar clustering quality as raw data transmission. In the second domain, we target the elasticity in data streaming to utilize computational resources in runtime regarding the data rate fluctuations. For that, we provide a stream processing framework, STRETCH, and introduce the concept of virtual shared-nothing parallelization that is able to adapt the resources, maximize the throughput and latency, and preserve determinism. Thorough experimental evaluations on architectures representative of high-end servers and of resource-constrained embedded devices indicate the scalability benefits of all proposed algorithms.
clustering
stream/continuous data processing
edge computing
Internet of Things
parallelism
elasticity
data analysis
fog computing
scalability
Author
Hannaneh Najdataei
Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)
Continuous and parallel LiDAR point-cloud clustering
Proceedings - International Conference on Distributed Computing Systems,;Vol. 2018-July(2018)p. 671-684
Paper in proceeding
Bastian Havers, Romaric Duvignau, Hannaneh Najdataei, Vincenzo Gulisano, Ashok Chaitanya Koppisetty, Marina Papatriantafilou, DRIVEN: a framework for efficient data retrieval and clustering in vehicular networks
Hannaneh Najdataei, Yiannis Nikolakopoulos, Marina Papatriantafilou, Philippas Tsigas, Vincenzo Gulisano, STRETCH: Scalable and Elastic Deterministic Streaming Analysis with Virtual Shared-Nothing Parallelism
Future factories in the Cloud (FiC)
Swedish Foundation for Strategic Research (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.
Subject Categories
Computer Engineering
Other Computer and Information Science
Publisher
Chalmers
EB, Hörsalsvägen 11, Campus Johanneberg, Chalmers
Opponent: Prof. Christoph Kessler, Linköping University, Sweden