MCExplorer: Exploring the Design Space of Multiple Compute-Engine Deep Learning Accelerators
Journal article, 2025

Model-aware Deep Learning (DL) accelerators surpass generic ones in terms of performance and efficiency. These model-aware accelerators typically comprise multiple dedicated Compute Engines (CEs) to handle the varying computational characteristics of the operations within a DL model. Multiple-CE accelerators usually target Field-Programmable Gate Arrays (FPGAs), as FPGAs’ reconfigurability enables tailoring the CEs architectures to the varying computational characteristics of the model operations. The continuous evolution of DL models and their use in application domains with diverse optimization objectives, including low latency, high throughput, and energy efficiency, makes it challenging to identify highly optimized multiple-CE accelerator architectures. The design space of multiple-CE accelerators is vast, and the state of the art explores only limited parts of this space, which hinders the identification of accelerators with high performance and efficiency.

To address this challenge, we propose a framework for exploring the design space of FPGA-based multiple-CE accelerators (MCExplorer). MCExplorer comprises a set of single and multiobjective optimization heuristics that target throughput, latency, energy efficiency, and trade-offs among these metrics. MCExplorer searches for optimized multiple-CE accelerator architectures given a DL model, a hardware resource budget, and a single or multiple objectives. MCExplorer explores a space beyond that explored in the literature by not restricting the accelerator inter-CE arrangements and exploring distinct configurations of individual CEs. We evaluate MCExplorer with various DL models and hardware resource budgets. The evaluation shows that by exploring a search space beyond that in the literature, MCExplorer identifies highly optimized multiple-CE accelerators. These accelerators achieve up to 2.8× throughput improvement, 2.1× speedup, and 45% energy reduction compared to the state of the art. Moreover, the evaluation demonstrates that broad space exploration is key to identifying multiple-CE accelerators with the best performance-efficiency trade-offs. MCExplorer code is available at https://github.com/fqararyah/MCExpl

field-programmable gate arrays (FPGAs)

multiple compute-engine accelerators

Design space exploration (DSE)

deep learning accelerators

Author

Fareed Mohammad Qararyah

University of Gothenburg

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Mohammad Ali Maleki

University of Gothenburg

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Pedro Petersen Moura Trancoso

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

University of Gothenburg

Transactions on Architecture and Code Optimization

1544-3566 (ISSN) 1544-3973 (eISSN)

Vol. 22 4

Principer för beräknande minnesenheter (PRIDE)

Swedish Foundation for Strategic Research (SSF) (DnrCHI19-0048), 2021-01-01 -- 2025-12-31.

EPI SGA2

European Commission (EC) (101036168), 2022-01-01 -- 2024-12-31.

Very Efficient Deep Learning in IOT (VEDLIoT)

European Commission (EC) (EC/H2020/957197), 2020-11-01 -- 2023-10-31.

Areas of Advance

Information and Communication Technology

Subject Categories (SSIF 2025)

Other Engineering and Technologies

Electrical Engineering, Electronic Engineering, Information Engineering

DOI

10.1145/3774913

More information

Latest update

1/7/2026 7