Parallel Data Streaming Analytics in the Context of Internet of Things
Licentiate thesis, 2019

We are living in an increasingly connected world, where the ubiquitously sensing technologies enable inter-connection of physical objects, as part of Internet of Things (IoT), and provide continuous massive amount of data. As this growth soars, benefits and challenges come together, which requires development of right tools in order to extract valuable information from data. For that, new techniques (e.g. data stream processing) have emerged to perform continuous single pass analysis and enhance parallelism. However, employing such techniques is not a trivial task due to its challenges such as partial knowledge of the data and the trade-off between parallelism and consistency. Moreover, depending on the source, data volumes may fluctuate over time which requires the degree of parallelism to be adapted in runtime.
In this work, we contribute to the design of computational infrastructures and development of tools to address these challenges. In this regard, we focus on two problem domains. First, we target continuous data analysis and particularly focus on data clustering, as a significant representative problem, to extract information from massive data, generated by high-rate sensors. We propose Lisco, a single-pass continuous Euclidean distance-based clustering which exploits the inherent ordering of the spatial and temporal data, and its parallel counterpart, P-Lisco, to enhance pipeline- and data-parallelism. These algorithms provide high throughput of results with low latency, through pushing the processing closer to the data sources. Moreover we provide a framework, DRIVEN, that performs a continuous bounded error approximation to compress the volumes of data, and then transmits the compressed data to next layers of the IoT architecture to perform clustering on it, in a continuous fashion, using generalized form of Lisco. The compression in data retrieval speeds up the transmission of the data while preserving very similar clustering quality as raw data transmission. In the second domain, we target the elasticity in data streaming to utilize computational resources in runtime regarding the data rate fluctuations. For that, we provide a stream processing framework, STRETCH, and introduce the concept of virtual shared-nothing parallelization that is able to adapt the resources, maximize the throughput and latency, and preserve determinism. Thorough experimental evaluations on architectures representative of high-end servers and of resource-constrained embedded devices indicate the scalability benefits of all proposed algorithms.

clustering

stream/continuous data processing

edge computing

Internet of Things

parallelism

elasticity

data analysis

fog computing

scalability

EB, Hörsalsvägen 11, Campus Johanneberg, Chalmers
Opponent: Prof. Christoph Kessler, Linköping University, Sweden

Author

Hannaneh Najdataei

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Continuous and parallel LiDAR point-cloud clustering

Proceedings - International Conference on Distributed Computing Systems,;Vol. 2018-July(2018)p. 671-684

Paper in proceeding

Bastian Havers, Romaric Duvignau, Hannaneh Najdataei, Vincenzo Gulisano, Ashok Chaitanya Koppisetty, Marina Papatriantafilou, DRIVEN: a framework for efficient data retrieval and clustering in vehicular networks

Hannaneh Najdataei, Yiannis Nikolakopoulos, Marina Papatriantafilou, Philippas Tsigas, Vincenzo Gulisano, STRETCH: Scalable and Elastic Deterministic Streaming Analysis with Virtual Shared-Nothing Parallelism

Future factories in the Cloud (FiC)

Swedish Foundation for Strategic Research (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.

Subject Categories (SSIF 2011)

Computer Engineering

Other Computer and Information Science

Publisher

Chalmers

EB, Hörsalsvägen 11, Campus Johanneberg, Chalmers

Opponent: Prof. Christoph Kessler, Linköping University, Sweden

More information

Latest update

4/26/2019