Stream Aggregation with Compressed Sliding Windows
Artikel i vetenskaplig tidskrift, 2023

High performance stream aggregation is critical for many emerging applications that analyze massive volumes of data. Incoming data needs to be stored in a sliding window during processing, in case the aggregation functions cannot be computed incrementally. Updating the window with new incoming values and reading it to feed the aggregation functions are the two primary steps in stream aggregation. Although window updates can be supported efficiently using multi-level queues, frequent window aggregations remain a performance bottleneck as they put tremendous pressure on the memory bandwidth and capacity. This article addresses this problem by enhancing StreamZip, a dataflow stream aggregation engine that is able to compress the sliding windows. StreamZip deals with a number of data and control dependency challenges to integrate a compressor in the stream aggregation pipeline and alleviate the memory pressure posed by frequent aggregations. In addition, StreamZip incorporates a caching mechanism for dealing with skewed-key distributions in the incoming data stream. In doing so, StreamZip offers higher throughput as well as larger effective window capacity to support larger problems. StreamZip supports diverse compression algorithms offering both lossless and lossy compression to integers as well as floating-point numbers. Compared to designs without compression, StreamZip lossless and lossy designs achieve up to 7.5× and 22× higher throughput, while improving the effective memory capacity by up to 5× and 23×, respectively.

stream processing

Additional Key Words and PhrasesCompression

sliding windows

aggregation

dataflow

Författare

Prajith Ramakrishnan Geethakumari

Chalmers, Data- och informationsteknik, Datorteknik

Ioannis Sourdis

Chalmers, Data- och informationsteknik, Datorteknik

ACM Transactions on Reconfigurable Technology and Systems

1936-7406 (ISSN) 1936-7414 (eISSN)

Vol. 16 3 37

Ämneskategorier

Datorteknik

Datavetenskap (datalogi)

Datorsystem

DOI

10.1145/3590774

Mer information

Senast uppdaterat

2023-09-07