Efficient Processing of Compact and Heterogeneous Deep Neural Networks
Licentiate thesis, 2024
The constant evolution of state-of-the-art DNNs and their use in domains that have constantly changing algorithms and standards motivate deploying them on flexible hardware. That is hardware that could be programmed or reconfigured to support such variations. Moreover, the massive parallelism present in these DNNs suggests that such hardware should support parallelism. Field Programmable Gate Arrays (FPGAs), and General-Purpose GraphicsProcessing Units (GPGPUs) are two widely used devices that offer flexibility and support parallelism. This thesis presents hardware and software designs,i.e. accelerators and library routines, that enable efficient processing of the compact DNNs and their constituting operators on FPGAs, and GPGPUs.
The first contribution of the thesis is Fixed Budget Hybrid CNN Accelerator(FiBHA). FiBHA is a hybrid architecture that combines both dedicated and reusable processing engines in a way that enables striking a balance between capturing DNN model heterogeneity and being resource-aware. The second contribution is proposing Fused Convolutional Modules (FCMs), a set of GPU kernels fusing various combinations of two core operators used in compact vision, including convolutional neural networks (CNNs) and vision transformers(ViTs). These operations are depthwise (DW) and pointwise (PW) convolutions. FCMs alleviate these operators’ performance bottlenecks leading to low-latency and energy-efficient execution.
FiBHA improves the throughput by up to 4x and 2.5x compared to the prior art. It achieves up to 2x improvement in resource utilization. Moreover, it improves the energy efficiency by up to 28%. FCMs achieve up to 3.7x speedup over standard DL libraries layer-by-layer implementations, and up to 1.8x speedup in end-to-end implementations compared to a state-of-the-art DL compiler. Moreover, FCMs-based implementations consume down to 34% of the energy per inference compared to those of a DL compiler.
Layer Fusion
GPU
Inter-Layer Pipelining
Deep Learning Accelerators
Deep Neural Networks (DNNs)
FPGA
Author
Fareed Mohammad Qararyah
Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)
An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs
Transactions on Architecture and Code Optimization,;Vol. 21(2024)
Journal article
FiBHA: Fixed Budget Hybrid CNN Accelerator
Proceedings - Symposium on Computer Architecture and High Performance Computing,;(2022)p. 180-190
Paper in proceeding
fcms: Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs
Very Efficient Deep Learning in IOT (VEDLIoT)
European Commission (EC) (EC/H2020/957197), 2020-11-01 -- 2023-10-31.
Subject Categories
Computer Engineering
Computer Science
Computer Systems
Publisher
Chalmers
HC2
Opponent: Prof. Christos Savvas Bouganis, Imperial College London, UK