S-RASTER: contraction clustering for evolving data streams
Journal article, 2020

Contraction Clustering (RASTER) is a single-pass algorithm for density-based clustering of 2D data. It can process arbitrary amounts of data in linear time and in constant memory, quickly identifying approximate clusters. It also exhibits good scalability in the presence of multiple CPU cores. RASTER exhibits very competitive performance compared to standard clustering algorithms, but at the cost of decreased precision. Yet, RASTER is limited to batch processing and unable to identify clusters that only exist temporarily. In contrast, S-RASTER is an adaptation of RASTER to the stream processing paradigm that is able to identify clusters in evolving data streams. This algorithm retains the main benefits of its parent algorithm, i.e. single-pass linear time cost and constant memory requirements for each discrete time step within a sliding window. The sliding window is efficiently pruned, and clustering is still performed in linear time. Like RASTER, S-RASTER trades off an often negligible amount of precision for speed. Our evaluation shows that competing algorithms are at least 50% slower. Furthermore, S-RASTER shows good qualitative results, based on standard metrics. It is very well suited to real-world scenarios where clustering does not happen continually but only periodically.

Clustering

Big data analytics

Unsupervised learning

Big data

Machine learning

Stream processing

Author

Gregor Ulm

Fraunhofer-Chalmers Centre

Fraunhofer Center for Machine Learning

Simon Smith

Fraunhofer Center for Machine Learning

Fraunhofer-Chalmers Centre

Adrian Nilsson

Fraunhofer Center for Machine Learning

Fraunhofer-Chalmers Centre

Emil Gustavsson

Fraunhofer-Chalmers Centre

Fraunhofer Center for Machine Learning

Mats Jirstrand

Fraunhofer Center for Machine Learning

Fraunhofer-Chalmers Centre

Journal of Big Data

2196-1115 (eISSN)

Vol. 7 1 62

Subject Categories (SSIF 2011)

Computer Engineering

Information Science

Computer Systems

DOI

10.1186/s40537-020-00336-3

More information

Latest update

3/23/2021