Hash-Based Designs for FPGA-Accelerated Parallel Stream Processing with Guaranteed Throughput
Licentiate thesis, 2026

Modern stream-processing systems are often expected to sustain line-rate throughput even under skewed or adversarial input distributions.
On FPGAs, this requires processing multiple stream elements per cycle, but many stream-processing workloads contain ordering constraints, shared state, or other structures that make brute-force parallelization either expensive or unable to sustain throughput for all inputs.
This thesis examines how such workloads can be parallelized more efficiently while preserving guaranteed throughput.
It focuses on hash-based FPGA designs for two groups of stream-processing applications.
The first is processing across disjoint logical substreams, such as stream elements grouped by key in analytical stream processing or packet flows in network functions, where each substream has its own state and ordering constraints.
The second is processing a single ordered stream, where each element of the stream strictly follows and depends on the one before.
One example is pattern matching, where high throughput requires unrolling the computation across multiple stream offsets per cycle.
In both cases, the challenge is to increase throughput without replicating resources, i.e. state memories or processing logic, in proportion to the desired parallelism, and without allowing data skew or resource access conflicts to compromise worst-case throughput.
Three works are included in this thesis, each presenting a reconfigurable accelerator.
Across all three accelerators, hashing is used to enable multiple operations per cycle while avoiding or reducing resource replication and preserving worst-case throughput.
Multi Hash Table is a multi-banked hash table for sliding-window stream aggregation that uses dynamic address mappings and access merging to reroute accesses that would otherwise often conflict and combine compatible updates, guaranteeing $N$ parallel read-modify-write operations per cycle without replicating the table contents.
It reaches 1.2 billion tuples per second, a 7.5x improvement over a single-tuple-per-cycle baseline.
Second, HydraHT extends the approach to stateful packet processing with a larger DRAM-backed state table, using iterative handling of state-table misses and improved buffering and batching to preserve throughput despite longer state-access latency.
It implements a timeout-based UDP firewall that supports 32 million flows, achieving peak throughput of 720 million packets per second (Mpps) and worst-case throughput of 415 Mpps.
The third design targets static pattern matching at multiple bytes per cycle and avoids per-offset replication by grouping mutually exclusive pattern-offset pairs into shared hash-based matchers.
It reaches 103.4 Gbps for 4711 static SNORT intrusion-detection patterns using a stride of 64 bytes.
Together, these results show that deterministic high-throughput stream processing on FPGAs can be achieved by carefully controlling how stream workloads are mapped to parallel resources.
Load balancing, batching, buffering, and application-specific grouping make it possible to preserve throughput across input distributions, while avoiding the memory and logic costs of straightforward replication.

Stream Processing

Reconfigurable Computing

Worst-Case Throughput

FPGA

EF
Opponent: Associate Professor Artur Podobas, KTH Royal Institute of Technology, Sweden

Author

Magnus Östgren

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

A Parallel Hash Table for Streaming Applications

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT,;(2024)p. 297-308

Paper in proceeding

Östgren, M., Sourdis, I. HydraHT: Guaranteed High-throughput Stateful Packet Processing

Östgren, M., Unterberger, A.-M., Sourdis, I. 100 Gbps Hash-Based Reconfigurable Pattern Matching

Principer för beräknande minnesenheter (PRIDE)

Swedish Foundation for Strategic Research (SSF) (DnrCHI19-0048), 2021-01-01 -- 2025-12-31.

Subject Categories (SSIF 2025)

Computer Engineering

Algorithms

Telecommunications

Computer Systems

Publisher

Chalmers

EF

Opponent: Associate Professor Artur Podobas, KTH Royal Institute of Technology, Sweden

More information

Latest update

5/20/2026