Efficient Approximate Big Data Clustering: Distributed and Parallel Algorithms in the Spectrum of IoT Architectures
Licentiatavhandling, 2019

Clustering, the task of grouping together similar items, is a frequently used method for processing data, with numerous applications. Clustering the data generated by sensors in the Internet of Things, for instance, can be useful for monitoring and making control decisions. For example, a cyber physical environment can be monitored by one or more 3D laser-based sensors to detect the objects in that environment and avoid critical situations, e.g. collisions.

With the advancements in IoT-based systems, the volume of data produced by, typically high-rate, sensors has become immense. For example, a 3D laser-based sensor with a spinning head can produce hundreds of thousands of points in each second. Clustering such a large volume of data using conventional clustering methods takes too long time, violating the time-sensitivity requirements of applications leveraging the outcome of the clustering. For example, collisions in a cyber physical environment must be prevented as fast as possible.

The thesis contributes to efficient clustering methods for distributed and parallel computing architectures, representative of the processing environments in IoT- based systems. To that end, the thesis proposes MAD-C (abbreviating Multi-stage Approximate Distributed Cluster-Combining) and PARMA-CC (abbreviating Parallel Multiphase Approximate Cluster Combining). MAD-C is a method for distributed approximate data clustering. MAD-C employs an approximation-based data synopsis that drastically lowers the required communication bandwidth among the distributed nodes and achieves multiplicative savings in computation time, compared to a baseline that centrally gathers and clusters the data. PARMA-CC is a method for parallel approximate data clustering on multi-cores. Employing approximation-based data synopsis, PARMA-CC achieves scalability on multi-cores by increasing the synergy between the work-sharing procedure and data structures to facilitate highly parallel execution of threads. The thesis provides analytical and empirical evaluation for MAD-C and PARMA-CC.

distributed and parallel processing

Clustering

Approximation-based synopsis

EL41
Opponent: Prof. Ralf Klasing from University of Bordeaux, France

Författare

Amir Keramatian

Chalmers, Data- och informationsteknik, Nätverk och system

MAD-C: Multi-stage Approximate Distributed Cluster-Combining for Obstacle Detection and Localization

Lecture Notes in Computer Science,; Vol. 11339(2019)p. 312-324

Paper i proceeding

Molnbaserade produkter och produktion (FiC)

Stiftelsen för Strategisk forskning (SSF), 2016-01-01 -- 2020-12-31.

Ämneskategorier

Datorteknik

Mediateknik

Datavetenskap (datalogi)

Styrkeområden

Informations- och kommunikationsteknik

Utgivare

Chalmers tekniska högskola

EL41

Opponent: Prof. Ralf Klasing from University of Bordeaux, France

Mer information

Senast uppdaterat

2019-12-27