Efficient Approximate Big Data Clustering: Distributed and Parallel Algorithms in the Spectrum of IoT Architectures
Licentiate thesis, 2019

Clustering, the task of grouping together similar items, is a frequently used method for processing data, with numerous applications. Clustering the data generated by sensors in the Internet of Things, for instance, can be useful for monitoring and making control decisions. For example, a cyber physical environment can be monitored by one or more 3D laser-based sensors to detect the objects in that environment and avoid critical situations, e.g. collisions.

With the advancements in IoT-based systems, the volume of data produced by, typically high-rate, sensors has become immense. For example, a 3D laser-based sensor with a spinning head can produce hundreds of thousands of points in each second. Clustering such a large volume of data using conventional clustering methods takes too long time, violating the time-sensitivity requirements of applications leveraging the outcome of the clustering. For example, collisions in a cyber physical environment must be prevented as fast as possible.

The thesis contributes to efficient clustering methods for distributed and parallel computing architectures, representative of the processing environments in IoT- based systems. To that end, the thesis proposes MAD-C (abbreviating Multi-stage Approximate Distributed Cluster-Combining) and PARMA-CC (abbreviating Parallel Multiphase Approximate Cluster Combining). MAD-C is a method for distributed approximate data clustering. MAD-C employs an approximation-based data synopsis that drastically lowers the required communication bandwidth among the distributed nodes and achieves multiplicative savings in computation time, compared to a baseline that centrally gathers and clusters the data. PARMA-CC is a method for parallel approximate data clustering on multi-cores. Employing approximation-based data synopsis, PARMA-CC achieves scalability on multi-cores by increasing the synergy between the work-sharing procedure and data structures to facilitate highly parallel execution of threads. The thesis provides analytical and empirical evaluation for MAD-C and PARMA-CC.

Approximation-based synopsis

Clustering

distributed and parallel processing

EL41
Opponent: Prof. Ralf Klasing from University of Bordeaux, France

Author

Amir Keramatian

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

MAD-C: Multi-stage Approximate Distributed Cluster-Combining for Obstacle Detection and Localization

Lecture Notes in Computer Science,;Vol. 11339(2019)p. 312-324

Paper in proceeding

Future factories in the Cloud (FiC)

Swedish Foundation for Strategic Research (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.

Subject Categories (SSIF 2011)

Computer Engineering

Media Engineering

Computer Science

Areas of Advance

Information and Communication Technology

Publisher

Chalmers

EL41

Opponent: Prof. Ralf Klasing from University of Bordeaux, France

More information

Latest update

4/11/2022