DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN Accelerators
Paper in proceeding, 2024

Deep neural network (DNN) accelerators suffer from poor utilization of on-chip memory which potentially reduces performance and energy efficiency. Loop reordering and blocking are used to improve on-chip memory utilization in DNN accelerators. However, existing optimization frameworks are inefficient due to either a prohibitive time complexity of searching the entire search space or due to a sub-optimal choice of optimizations. This paper proposes DNNOPT - ahardware/software framework for optimally selecting loop order and blocking factors, for loop reordering and blocking in isolation or in combination. DNNOPT uses proposed Early exit and Strided search strategies to prune the search space and simple analytical models of data reuse to evaluate each optimization point efficiently and accurately. Overall, DNNOPT reduces the search space by more than two orders of magnitude and improves performance, energy efficiency and time to solution, on average, by 1.8×, 50%, and 226×, respectively, of convolutional neural network (CNN) and Transformer applications compared to state-of-the-art frameworks.

On-chip Memory Management

Loop Re-Order

Energy Efficient DNN Acceleration

Reuse Distance

Loop Blocking

DNN acceleration

Author

Piyumal Ranawaka

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Muhammad Waqar Azhar

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Per Stenström

Chalmers, Computer Science and Engineering (Chalmers), Computer and Network Systems

Proceedings of the 21st ACM International Conference on Computing Frontiers, CF 2024

126-137
9798400705977 (ISBN)

21st ACM International Conference on Computing Frontiers, CF 2024
Ischia, Italy,

Subject Categories (SSIF 2011)

Computer Systems

DOI

10.1145/3649153.3649196

More information

Latest update

8/7/2024 5