Efficient, Adaptable, and Scalable Synopses for Data-Intensive Systems
Licentiatavhandling, 2026
Targeting these challenges, this thesis studies two core summarization primitives, heavy-hitter detection and frequency estimation, and contributes as follows. Chapter A analyzes the trade-offs among throughput, memory usage, and accuracy in heavy-hitter detection algorithms; the insights led to the design of the Cuckoo Heavy Keeper (CHK) algorithm, which introduces a process for distinguishing frequent from infrequent items that unlocks synergies inaccessible to conventional approaches, such as reduced per-item instruction cost and improved cache behavior. Chapter A also introduces a categorization of parallelization approaches and the multi-CHK (mCHK) framework, which can parallelize any sequential heavy-hitter algorithm, with support for concurrent updates and queries. Chapter B identifies three properties that target the above challenges: resizability (adjusting memory at runtime), enhanced mergeability (combining differently-sized summaries), and partitionability (splitting state for elastic scaling and load rebalancing). Building on these properties, Chapter B proposes ReSketch, a frequency estimation sketch design that achieves all three while maintaining a beneficial memory-to-accuracy ratio, together with the instance provenance DAG, which tracks how approximation bounds evolve through arbitrary sequences of these operations. Together, these results provide complementary building blocks for efficient, adaptable, and scalable summarization in modern data-intensive systems.
Scalability
Efficiency
Synopsis
Data-Intensive Systems
Adaptability
Concurrency & Parallelism
Data Summarization
Författare
Quang Vinh Ngo
Chalmers, Data- och informationsteknik, Dator- och nätverkssystem
Cuckoo Heavy Keeper and the balancing act of maintaining heavy hitters in stream processing
Proceedings of the VLDB Endowment,;Vol. 18(2025)p. 3149-3161
Paper i proceeding
ReSketch: A Mergeable, Partitionable, and Resizable Sketch
Relaxed Semantics Across the Data Analytics Stack (RELAX-DN)
Europeiska kommissionen (EU) (EC/HE/101072456), 2023-03-01 -- 2027-03-01.
Ämneskategorier (SSIF 2025)
Datavetenskap (datalogi)
Datorteknik
Datorsystem
Technical report L - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University
Utgivare
Chalmers
Room ED, The EDIT building, Chalmers University of Technology (Campus Johanneberg)
Opponent: Prof. Papapetrou Odysseas, Eindhoven University of Technology, The Netherlands