DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN Accelerators
Paper i proceeding, 2024

Deep neural network (DNN) accelerators suffer from poor utilization of on-chip memory which potentially reduces performance and energy efficiency. Loop reordering and blocking are used to improve on-chip memory utilization in DNN accelerators. However, existing optimization frameworks are inefficient due to either a prohibitive time complexity of searching the entire search space or due to a sub-optimal choice of optimizations. This paper proposes DNNOPT - ahardware/software framework for optimally selecting loop order and blocking factors, for loop reordering and blocking in isolation or in combination. DNNOPT uses proposed Early exit and Strided search strategies to prune the search space and simple analytical models of data reuse to evaluate each optimization point efficiently and accurately. Overall, DNNOPT reduces the search space by more than two orders of magnitude and improves performance, energy efficiency and time to solution, on average, by 1.8×, 50%, and 226×, respectively, of convolutional neural network (CNN) and Transformer applications compared to state-of-the-art frameworks.

On-chip Memory Management

Loop Re-Order

Energy Efficient DNN Acceleration

Reuse Distance

Loop Blocking

DNN acceleration

Författare

Piyumal Ranawaka

Chalmers, Data- och informationsteknik, Datorteknik

Muhammad Waqar Azhar

Chalmers, Data- och informationsteknik, Datorteknik

Per Stenström

Chalmers, Data- och informationsteknik, Dator- och nätverkssystem

Proceedings of the 21st ACM International Conference on Computing Frontiers, CF 2024

126-137
9798400705977 (ISBN)

21st ACM International Conference on Computing Frontiers, CF 2024
Ischia, Italy,

Ämneskategorier

Datorsystem

DOI

10.1145/3649153.3649196

Mer information

Senast uppdaterat

2024-08-07