Poster: Cuckoo Heavy Keeper and the balancing act of maintaining heavy hitters in stream processing
Poster (konferens), 2025

This repository contains the poster for the paper "Cuckoo Heavy Keeper and the balancing act of maintaining heavy hitters in stream processing". The work is presented at the 51st International Conference on Very Large Data Bases (VLDB 2025).

This research introduces two main contributions:
Cuckoo Heavy Keeper (CHK): a fast, accurate, and space-efficient heavy-hitter detection algorithm that delivers orders of magnitude better throughput and accuracy compared to state-of-the-art methods, even with tight memory constraints and low-skew data. It rethinks conventional data flows by introducing a "lobby" that acts as a lightweight filter, which requires items to first prove their significance before being promoted to the heavy part. This inverted process unlocks new algorithmic synergies previously inaccessible with conventional approaches, such as selectively applying hash collision resolution to heavy-hitter candidates and using pre-calculation for expensive operations.
Parallel Processing Framework: A flexible framework specifically designed to parallelize any heavy-hitter detection algorithm without requiring mergeability. Notably, our parallel algorithms operate as a wrapper around any sequential heavy-hitter algorithm. Developers can simply replace any HeavyHitterAlgorithm class or instance with ParallelWrapper<HeavyHitterAlgorithm> for immediate parallelization with minimal changes to existing systems. The framework offers two optimization variants (insertion-optimized mCHK-I and query-optimized mCHK-Q), achieves near-linear scaling with thread count, and processes billions of updates per second with very low query latency (<150 μsec) at a modest 2.1GHz clock rate.
This makes Cuckoo Heavy Keeper and its parallel variants valuable both as standalone algorithmic designs and as integrable building blocks within databases, stream processing engines, and data analytics frameworks.

 
Article: https://www.vldb.org/pvldb/vol18/p3149-ngo.pdf
Artifact: https://doi.org/10.5281/zenodo.15593109 
Github link: https://github.com/vinhqngo5/Cuckoo_Heavy_Keeper

Data Structures and Algorithms

Parallel Computing

Data Summarization

Författare

Quang Vinh Ngo

Chalmers, Data- och informationsteknik, Dator- och nätverkssystem

Marina Papatriantafilou

Chalmers, Data- och informationsteknik, Dator- och nätverkssystem

the 51st International Conference on Very Large Data Bases (VLDB 2025)
London, United Kingdom,

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

DOI

10.5281/zenodo.16950399

Mer information

Skapat

2026-04-13