Poster: Cuckoo Heavy Keeper and the balancing act of maintaining heavy hitters in stream processing
Poster (konferens), 2025
This research introduces two main contributions:
Cuckoo Heavy Keeper (CHK): a fast, accurate, and space-efficient heavy-hitter detection algorithm that delivers orders of magnitude better throughput and accuracy compared to state-of-the-art methods, even with tight memory constraints and low-skew data. It rethinks conventional data flows by introducing a "lobby" that acts as a lightweight filter, which requires items to first prove their significance before being promoted to the heavy part. This inverted process unlocks new algorithmic synergies previously inaccessible with conventional approaches, such as selectively applying hash collision resolution to heavy-hitter candidates and using pre-calculation for expensive operations.
Parallel Processing Framework: A flexible framework specifically designed to parallelize any heavy-hitter detection algorithm without requiring mergeability. Notably, our parallel algorithms operate as a wrapper around any sequential heavy-hitter algorithm. Developers can simply replace any HeavyHitterAlgorithm class or instance with ParallelWrapper<HeavyHitterAlgorithm> for immediate parallelization with minimal changes to existing systems. The framework offers two optimization variants (insertion-optimized mCHK-I and query-optimized mCHK-Q), achieves near-linear scaling with thread count, and processes billions of updates per second with very low query latency (<150 μsec) at a modest 2.1GHz clock rate.
This makes Cuckoo Heavy Keeper and its parallel variants valuable both as standalone algorithmic designs and as integrable building blocks within databases, stream processing engines, and data analytics frameworks.
Article: https://www.vldb.org/pvldb/vol18/p3149-ngo.pdf
Artifact: https://doi.org/10.5281/zenodo.15593109
Github link: https://github.com/vinhqngo5/Cuckoo_Heavy_Keeper
Data Structures and Algorithms
Parallel Computing
Data Summarization
Författare
Quang Vinh Ngo
Chalmers, Data- och informationsteknik, Dator- och nätverkssystem
Marina Papatriantafilou
Chalmers, Data- och informationsteknik, Dator- och nätverkssystem
London, United Kingdom,
Ämneskategorier (SSIF 2025)
Datavetenskap (datalogi)
DOI
10.5281/zenodo.16950399