Scratchpad Memory Management for Deep Learning Accelerators
Paper i proceeding, 2024
The success of Artificial Intelligence (AI) applications is driven by efficient hardware accelerators. Recent trends show a rapid increase in the application demands, which in most cases surpass the available resources in the accelerators. As such, the efficient management of these limited resources becomes a critical factor in achieving high-performance. In this work we focus on the management of the available on-chip memory resources for Deep Learning (DL) accelerators. While most state-of-the-art accelerators have static buffer separation for different data types, we observed that the heterogeneity of recent DL models demands for more flexible solutions. In this work we propose using all on-chip scratchpad memory, including space for double buffering, in a unified way. To efficiently exploit that space, we propose a memory management technique that can apply different policies to best meet the demands of each different execution phase. For cases when the available memory is less than the requirements, the memory management can use the available space for either optimizing the data reuse or the fetching of data ahead. Comparing our approach against a baseline accelerator shows that the flexibility in the management of the scratchpad memory leads to a considerable reduction of up to 80% of the off-chip memory accesses, or up to 56% of the latency.
Memory Management
Deep Learning Accelerators
Scratchpad