Efficient Data Streaming Multiway Aggregation through Concurrent Algorithmic Designs and New Abstract Data Types
Artikel i vetenskaplig tidskrift, 2017

Data streaming relies on continuous queries to process unbounded streams of data in a real-time fashion. It is commonly demanding in computation capacity, given that the relevant applications involve very large volumes of data. Data structures act as articulation points and maintain the state of data streaming operators, potentially supporting high parallelism and balancing the work among them. Prompted by this fact, in this work we study and analyze parallelization needs of these articulation points, focusing on the problem of streaming multiway aggregation, where large data volumes are received from multiple input streams. The analysis of the parallelization needs, as well as of the use and limitations of existing aggregate designs and their data structures, leads us to identify needs for appropriate shared objects that can achieve low-latency and high-throughput multiway aggregation. We present the requirements of such objects as abstract data types and we provide efficient lock-free linearizable algorithmic implementations of them, along with new multiway aggregate algorithmic designs that leverage them, supporting both deterministic order-sensitive and order-insensitive aggregate functions. Furthermore, we point out future directions that open through these contributions. The article includes an extensive experimental study, based on a variety of continuous aggregation queries on two large datasets extracted from SoundCloud, a music social network, and from a Smart Grid network. In all the experiments, the proposed data structures and the enhanced aggregate operators improved the processing performance significantly, up to one order of magnitude, in terms of both throughput and latency, over the commonly used techniques based on queues.

lock-free synchronization

data structures

Data streaming

Författare

Vincenzo Massimiliano Gulisano

Chalmers, Data- och informationsteknik, Nätverk och system

Ioannis Nikolakopoulos

Chalmers, Data- och informationsteknik, Nätverk och system

Daniel Cederman

Chalmers, Data- och informationsteknik, Nätverk och system

Marina Papatriantafilou

Chalmers, Data- och informationsteknik, Nätverk och system

Philippas Tsigas

Chalmers, Data- och informationsteknik, Nätverk och system

ACM Transactions on Parallel Computing

23294949 (ISSN) 23294957 (eISSN)

Vol. 4 2 UNSP 11

Ämneskategorier

Annan data- och informationsvetenskap

Bioinformatik (beräkningsbiologi)

Mediateknik

DOI

10.1145/3131272

Mer information

Senast uppdaterat

2023-03-21