ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
Journal article, 2021

The inherently large and varying volumes of information generated in large scale systems demand near real-time processing of data streams. In this context, data streaming is imperative for data-intensive processing infrastructures. Stream joins, the streaming counterpart of database joins, compare tuples coming from different streams and constitute one of the most important and expensive data streaming operators. Algorithmic implementations of stream joins have to be capable of efficiently processing bursty and rate-varying data streams in a deterministic and skew-resilient fashion. To leverage the design of modern multicore architectures, scalability and parallelism need to be addressed also in the algorithmic design. In this paper we present ScaleJoin, an algorithmic construction for deterministic and parallel stream joins that guarantees all the above properties, thus filling in a gap in the existing state-of-theart. Key to the novelty of ScaleJoin is the ScaleGate data structure and its lock-free implementation. ScaleGate facilitates concurrent data exchange and balances independent actions among processing threads; enabling fine-grain parallelism and deterministic processing. It allows ScaleJoin to run on an arbitrary number of processing threads, evenly sharing the overall comparisons run in parallel and achieving disjoint and skew-resilient high processing throughput and low processing latency.

Algorithm design and analysis

Big data

Data structures

Parallel processing

Author

Vincenzo Massimiliano Gulisano

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Ioannis Nikolakopoulos

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Marina Papatriantafilou

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Philippas Tsigas

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

IEEE Transactions on Big Data

23327790 (eISSN)

Vol. 7 2 299-312 7731236

Areas of Advance

Information and Communication Technology

Subject Categories

Computer Science

DOI

10.1109/TBDATA.2016.2624274

More information

Latest update

4/21/2023