BATCH-DNN: Adaptive and Dynamic Batching for Multi-DNN Accelerators
Paper i proceeding, 2026

Multi-DNN accelerators enable the simultaneous execution of multiple DNN workloads which improves performance by overlapping computations and memory accesses of multiple DNN workloads. However, on-chip memory must accommodate the footprint of all workloads. Batching allows DNN inferences using the same model to share weights which improves weight reuse and reducing off-chip access costs over a batch. Batching determines the batch size statically, leading to stalls when there is not enough on-chip memory available at runtime. This paper introduces BATCH-DNN, a dynamic method for adapting batch size on a layer-by-layer basis to available on-chip memory. It employs two techniques: adaptive cascaded sub-batching and adaptive sub-batch merging. Offline profiling establishes the footprint, while run-time adjustment establishes the maximum batch size on a layer-by-layer basis based on available on-chip memory. BATCH-DNN can improve the utilization of accelerator compute fabrics by 60%, which increases throughput by up to 27% and by 6%, on average, for batched multi-DNN workloads.

On-chip memory management

Multi-DNN accelerator

Batching

Författare

Piyumal Ranawaka

Chalmers, Data- och informationsteknik, Datorteknik

Göteborgs universitet

Per Stenström

Chalmers, Data- och informationsteknik, Datorteknik

Göteborgs universitet

Lecture Notes in Computer Science

0302-9743 (ISSN) 1611-3349 (eISSN)

Vol. 15901 LNCS 103-117
9783031998560 (ISBN)

31st International Conference on Parallel and Distributed Computing, Euro-Par 2025
Dresden, Germany,

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

Datorsystem

DOI

10.1007/978-3-031-99857-7_8

Mer information

Senast uppdaterat

2025-09-05