Efficient Approximate Big Data Clustering: Distributed and Parallel Algorithms in the Spectrum of IoT Architectures
Licentiatavhandling, 2019
With the advancements in IoT-based systems, the volume of data produced by, typically high-rate, sensors has become immense. For example, a 3D laser-based sensor with a spinning head can produce hundreds of thousands of points in each second. Clustering such a large volume of data using conventional clustering methods takes too long time, violating the time-sensitivity requirements of applications leveraging the outcome of the clustering. For example, collisions in a cyber physical environment must be prevented as fast as possible.
The thesis contributes to efficient clustering methods for distributed and parallel computing architectures, representative of the processing environments in IoT- based systems. To that end, the thesis proposes MAD-C (abbreviating Multi-stage Approximate Distributed Cluster-Combining) and PARMA-CC (abbreviating Parallel Multiphase Approximate Cluster Combining). MAD-C is a method for distributed approximate data clustering. MAD-C employs an approximation-based data synopsis that drastically lowers the required communication bandwidth among the distributed nodes and achieves multiplicative savings in computation time, compared to a baseline that centrally gathers and clusters the data. PARMA-CC is a method for parallel approximate data clustering on multi-cores. Employing approximation-based data synopsis, PARMA-CC achieves scalability on multi-cores by increasing the synergy between the work-sharing procedure and data structures to facilitate highly parallel execution of threads. The thesis provides analytical and empirical evaluation for MAD-C and PARMA-CC.
Approximation-based synopsis
Clustering
distributed and parallel processing
Författare
Amir Keramatian
Chalmers, Data- och informationsteknik, Nätverk och system
MAD-C: Multi-stage Approximate Distributed Cluster-Combining for Obstacle Detection and Localization
Lecture Notes in Computer Science,;Vol. 11339(2019)p. 312-324
Paper i proceeding
Molnbaserade produkter och produktion (FiC)
Stiftelsen för Strategisk forskning (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.
Ämneskategorier
Datorteknik
Mediateknik
Datavetenskap (datalogi)
Styrkeområden
Informations- och kommunikationsteknik
Utgivare
Chalmers
EL41
Opponent: Prof. Ralf Klasing from University of Bordeaux, France