Reconfigurable-Hardware Accelerated Stream Aggregation
Doctoral thesis, 2022
It does so first by building accelerators using FPGAs and second, by alleviating the memory pressure posed by single-window stream aggregation. The initial part of this thesis presents the accelerators for both windowing policies, namely, tuple- and time- based, using Maxeler's DataFlow Engines (DFEs) which have a direct feed of incoming data from the network as well as direct access to off-chip DRAM. Compared to state-of-the-art stream processing software system, the DFEs offer 1-2 orders of magnitude higher processing throughput and 4 orders of magnitude lower latency.
The later part of this thesis focuses on alleviating the memory pressure due to the various steps in single-window stream aggregation. Updating the window with new incoming values and reading it to feed the aggregation functions are the two primary steps in stream aggregation. The high on-chip SRAM bandwidth enables line-rate processing, but only for small problem sizes due to the limited capacity. The larger off-chip DRAM size supports larger problems, but falls short on performance due to lower bandwidth. In order to bridge this gap, this thesis introduces a specialized memory hierarchy for stream aggregation. It employs Multi-Level Queues (MLQs) spanning across multiple memory levels with different characteristics to offer both high bandwidth and capacity. In doing so, larger stream aggregation problems can be supported at line-rate performance, outperforming existing competing solutions. Compared to designs with only on-chip memory, our approach supports 4 orders of magnitude larger problems. Compared to designs that use only DRAM, our design achieves up to 8x higher throughput.
Finally, this thesis aims to alleviate the memory pressure due to the window-aggregation step. Although window-updates can be supported efficiently using MLQs, frequent window-aggregations remain a performance bottleneck. This thesis addresses this problem by introducing StreamZip, a dataflow stream aggregation engine that is able to compress the sliding-windows. StreamZip deals with a number of data and control dependency challenges to integrate a compressor in the stream aggregation pipeline and alleviate the memory pressure posed by frequent aggregations. In doing so, StreamZip offers higher throughput as well as larger effective window capacity to support larger problems. StreamZip supports diverse compression algorithms offering both lossless and lossy compression to fixed- as well as floating- point numbers. Compared to designs using MLQs, StreamZip lossless and lossy designs achieve up to 7.5x and 22x higher throughput, while improving the effective memory capacity by up to 5x and 23x, respectively.
Stream
Reconfigurable Computing
Aggregation
Dataflow
FPGA
Memory Hierarchy
Compression
Author
Prajith Ramakrishnan Geethakumari
Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)
Single Window Stream Aggregation using Reconfigurable Hardware
2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT),;(2017)p. 112-119
Paper in proceeding
Time-SWAD: A dataflow engine for time-based single window stream aggregation
Proceedings - 2019 International Conference on Field-Programmable Technology, ICFPT 2019,;Vol. 2019-December(2019)p. 72-80
Paper in proceeding
A Specialized Memory Hierarchy for Stream Aggregation
2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021),;(2021)p. 204-210
Paper in proceeding
Streamzip: Compressed Sliding-Windows for Stream Aggregation
2021 International Conference on Field-Programmable Technology, ICFPT 2021,;(2021)p. 203-211
Paper in proceeding
While the performance increase of general-purpose computing platforms is not able to keep up with the ever increasing generation rates of real life data streams, reconfigurable logic in FPGAs provide several unique strengths to provide faster solutions. When attached directly to the network, FPGAs can provide line-rate processing throughput with very low latency, because data no longer needs to traverse various software layers of the networking and application stack as in general purpose computers. With direct memory connectivity and custom accelerators mapped on to the reconfigurable fabric of the FPGA, unnecessary data movement across the memory hierarchy can be further minimised. Moreover, the computations in FPGA tend to be more energy-efficient because of the reduced data movement and that the FPGA operates at relatively low clock frequencies. This thesis aims to utilize these strengths of FPGA and proposes novel accelerators and memory management techniques for stream aggregation using reconfigurable-hardware.
A Novel, Comprehensible, Ultra-Fast, Security-Aware CPS Simulator (COSSIM)
European Commission (EC) (EC/H2020/644042), 2014-01-01 -- 2018-12-31.
ScalaNetS: Skalbara nätverks- och dataströmsberäkningar
Swedish Research Council (VR) (Dnr2016-05231), 2017-01-01 -- 2020-12-31.
Areas of Advance
Information and Communication Technology
Subject Categories (SSIF 2011)
Electrical Engineering, Electronic Engineering, Information Engineering
ISBN
978-91-7905-610-0
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5076
Publisher
Chalmers