Efficient Processing of Compact and Heterogeneous Deep Neural Networks
Licentiatavhandling, 2024
The constant evolution of state-of-the-art DNNs and their use in domains that have constantly changing algorithms and standards motivate deploying them on flexible hardware. That is hardware that could be programmed or reconfigured to support such variations. Moreover, the massive parallelism present in these DNNs suggests that such hardware should support parallelism. Field Programmable Gate Arrays (FPGAs), and General-Purpose GraphicsProcessing Units (GPGPUs) are two widely used devices that offer flexibility and support parallelism. This thesis presents hardware and software designs,i.e. accelerators and library routines, that enable efficient processing of the compact DNNs and their constituting operators on FPGAs, and GPGPUs.
The first contribution of the thesis is Fixed Budget Hybrid CNN Accelerator(FiBHA). FiBHA is a hybrid architecture that combines both dedicated and reusable processing engines in a way that enables striking a balance between capturing DNN model heterogeneity and being resource-aware. The second contribution is proposing Fused Convolutional Modules (FCMs), a set of GPU kernels fusing various combinations of two core operators used in compact vision, including convolutional neural networks (CNNs) and vision transformers(ViTs). These operations are depthwise (DW) and pointwise (PW) convolutions. FCMs alleviate these operators’ performance bottlenecks leading to low-latency and energy-efficient execution.
FiBHA improves the throughput by up to 4x and 2.5x compared to the prior art. It achieves up to 2x improvement in resource utilization. Moreover, it improves the energy efficiency by up to 28%. FCMs achieve up to 3.7x speedup over standard DL libraries layer-by-layer implementations, and up to 1.8x speedup in end-to-end implementations compared to a state-of-the-art DL compiler. Moreover, FCMs-based implementations consume down to 34% of the energy per inference compared to those of a DL compiler.
Layer Fusion
GPU
Inter-Layer Pipelining
Deep Learning Accelerators
Deep Neural Networks (DNNs)
FPGA
Författare
Fareed Mohammad Qararyah
Chalmers, Data- och informationsteknik, Datorteknik
An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs
Transactions on Architecture and Code Optimization,;Vol. 21(2024)
Artikel i vetenskaplig tidskrift
FiBHA: Fixed Budget Hybrid CNN Accelerator
Proceedings - Symposium on Computer Architecture and High Performance Computing,;(2022)p. 180-190
Paper i proceeding
fcms: Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs
Very Efficient Deep Learning in IOT (VEDLIoT)
Europeiska kommissionen (EU) (EC/H2020/957197), 2020-11-01 -- 2023-10-31.
Ämneskategorier
Datorteknik
Datavetenskap (datalogi)
Datorsystem
Utgivare
Chalmers
HC2
Opponent: Prof. Christos Savvas Bouganis, Imperial College London, UK