PARMA-CC: A Family of Parallel Multiphase Approximate Cluster Combining Algorithms
Journal article, 2023
We show that PARMA-CC algorithms yield equivalent clustering outcomes despite their different approaches. Furthermore, we show that certain PARMA-CC algorithms can achieve higher efficiency with respect to certain properties of the data to be clustered. Generally speaking, in PARMA-CC algorithms, parallel threads compute summaries associated with clusters of data (sub)sets. As the threads concurrently combine the summaries, they construct a comprehensive summary of the sets of clusters. By approximating a cluster with its respective geometrical summaries, PARMA-CC algorithms scale well with increased data volumes, and, by computing and efficiently combining the summaries in parallel, they enable latency improvements. PARMA-CC algorithms utilize special data structures that enable parallelism through in-place data processing. As we show in our analysis and evaluation, PARMA-CC algorithms can complement and outperform well-established methods, with significantly better scalability, while still providing highly accurate results in a variety of data sets, even with skewed data distributions, which cause the traditional approaches to exhibit their worst-case behaviour.
Parallel Clustering
Synchronization
Data Structures
Approximation
Author
Amir Keramatian
Network and Systems
Vincenzo Massimiliano Gulisano
Network and Systems
Marina Papatriantafilou
Network and Systems
Philippas Tsigas
Network and Systems
Journal of Parallel and Distributed Computing
0743-7315 (ISSN) 1096-0848 (eISSN)
Vol. 177 68-88HARE: Self-deploying and Adaptive Data Streaming Analytics in Fog Architectures
Swedish Research Council (VR) (2016-03800), 2017-01-01 -- 2020-12-31.
Future factories in the Cloud (FiC)
Swedish Foundation for Strategic Research (SSF) (GMT14-0032), 2016-01-01 -- 2020-12-31.
Subject Categories
Computer Engineering
Media Engineering
Computer Science
Computer Systems
Areas of Advance
Information and Communication Technology
Production
Driving Forces
Sustainable development
DOI
10.1016/j.jpdc.2023.02.001