QPOPSS: Query and Parallelism Optimized Space-Saving for finding frequent stream elements
Artikel i vetenskaplig tidskrift, 2025

The frequent elements problem, a key component in demanding stream-data analytics, involves selecting elements whose occurrence exceeds a user-specified threshold. Fast, memory-efficient ϵ-approximate synopsis algorithms select all frequent elements but may overestimate them depending on ϵ (user-defined parameter). Evolving applications demand performance only achievable by parallelization. However, algorithmic guarantees concerning concurrent updates and queries have been overlooked. We propose Query and Parallelism Optimized Space-Saving (QPOPSS ), providing concurrency guarantees. A cornerstone of the design is a new approach for the main data structure for the Space-Saving algorithm, enabling support of very fast queries. QPOPSS combines minimal overlap with concurrent updates, distributing work and using fine-grained thread synchronization to achieve high throughput, accuracy, and low memory use. Our analysis shows space and approximation bounds under various concurrency and data distribution conditions. Our empirical evaluation relative to representative state-of-the-art methods reveals that QPOPSS 's multithreaded throughput scales linearly while maintaining the highest accuracy, with orders of magnitude smaller memory footprint.

Författare

Victor Jarlow

Göteborgs universitet

AstaZero AB

Charalampos Stylianopoulos

Nätverk och System

emnify

Marina Papatriantafilou

Nätverk och System

Journal of Parallel and Distributed Computing

0743-7315 (ISSN) 1096-0848 (eISSN)

Vol. 204 105134

VR EPITOME - Sammanfattning och strukturering av kontinuerlig data i pipelines för samtidig behandling

Vetenskapsrådet (VR) (2021-05424), 2022-01-01 -- 2025-12-31.

Relaxed Semantics Across the Data Analytics Stack (RELAX)

Europeiska kommissionen (EU) (EC/H2020/101072456), 2023-03-01 -- 2027-02-28.

Ämneskategorier (SSIF 2025)

Kommunikationssystem

Datavetenskap (datalogi)

DOI

10.1016/j.jpdc.2025.105134

Mer information

Senast uppdaterat

2025-07-01