MCExplorer: Exploring the Design Space of Multiple Compute-Engine Deep Learning Accelerators
Artikel i vetenskaplig tidskrift, 2025

Model-aware Deep Learning (DL) accelerators surpass generic ones in terms of performance and efficiency. These model-aware accelerators typically comprise multiple dedicated Compute Engines (CEs) to handle the varying computational characteristics of the operations within a DL model. Multiple-CE accelerators usually target Field-Programmable Gate Arrays (FPGAs), as FPGAs’ reconfigurability enables tailoring the CEs architectures to the varying computational characteristics of the model operations. The continuous evolution of DL models and their use in application domains with diverse optimization objectives, including low latency, high throughput, and energy efficiency, makes it challenging to identify highly optimized multiple-CE accelerator architectures. The design space of multiple-CE accelerators is vast, and the state of the art explores only limited parts of this space, which hinders the identification of accelerators with high performance and efficiency.

To address this challenge, we propose a framework for exploring the design space of FPGA-based multiple-CE accelerators (MCExplorer). MCExplorer comprises a set of single and multiobjective optimization heuristics that target throughput, latency, energy efficiency, and trade-offs among these metrics. MCExplorer searches for optimized multiple-CE accelerator architectures given a DL model, a hardware resource budget, and a single or multiple objectives. MCExplorer explores a space beyond that explored in the literature by not restricting the accelerator inter-CE arrangements and exploring distinct configurations of individual CEs. We evaluate MCExplorer with various DL models and hardware resource budgets. The evaluation shows that by exploring a search space beyond that in the literature, MCExplorer identifies highly optimized multiple-CE accelerators. These accelerators achieve up to 2.8× throughput improvement, 2.1× speedup, and 45% energy reduction compared to the state of the art. Moreover, the evaluation demonstrates that broad space exploration is key to identifying multiple-CE accelerators with the best performance-efficiency trade-offs. MCExplorer code is available at https://github.com/fqararyah/MCExpl

multiple compute-engine accelerators

deep learning accelerators

Design space exploration (DSE)

field-programmable gate arrays (FPGAs)

Författare

Fareed Mohammad Qararyah

Chalmers, Data- och informationsteknik, Datorteknik

Mohammad Ali Maleki

Chalmers, Data- och informationsteknik, Datorteknik

Pedro Petersen Moura Trancoso

Chalmers, Data- och informationsteknik, Datorteknik

Transactions on Architecture and Code Optimization

1544-3566 (ISSN) 1544-3973 (eISSN)

Vol. 22 4

Very Efficient Deep Learning in IOT (VEDLIoT)

Europeiska kommissionen (EU) (EC/H2020/957197), 2020-11-01 -- 2023-10-31.

EPI SGA2

Europeiska kommissionen (EU) (101036168), 2022-01-01 -- 2024-12-31.

Principer för beräknande minnesenheter (PRIDE)

Stiftelsen för Strategisk forskning (SSF) (DnrCHI19-0048), 2021-01-01 -- 2025-12-31.

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier (SSIF 2025)

Annan teknik

Elektroteknik och elektronik

DOI

10.1145/3774913

Mer information

Skapat

2025-11-12