VSA: A Hybrid Vector-Systolic Architecture
Paper i proceeding, 2022

In order to deliver high performance efficiently, modern processors include dedicated hardware to accelerate different application domains. For example, several recent processors include dedicated Machine Learning (ML) accelerators. However, while adding dedicated hardware improves efficiency compared to general-purpose CPUs, it also requires a larger area, making it unfeasible for smaller devices. Therefore, exploring ways to use the existing hardware for different functionalities becomes desirable in those setups. In this work, we explore the reuse of the components in a Vector Processing Unit (VPU) to offer the functionality of a Systolic Array (SA) for General Matrix Multiplication (GEMM), a kernel extensively used in machine learning, big data, and scientific computing. This hybrid Vector-Systolic Architecture (VSA) can thus support Single Instruction Multiple Data (SIMD) instruction extensions with the VPU functionality and efficiently compute GEMM with the SA functionality. We present an implementation of VSA as a RISC-V co-processor that adds minimal hardware overhead of less than 0.1% compared to a baseline RISC-V implementation with a VPU. In our evaluation using different Deep Neural Network (DNN) models, VSA shows a speedup of up to 3.5x and a reduction in energy consumption of up to 70%.

DNN

Vector Unit

Systolic Array

Machine Learning

GEMM

SIMD

Författare

Mateo Vázquez Maceiras

Chalmers, Data- och informationsteknik, Datorteknik

Muhammad Waqar Azhar

Chalmers, Data- och informationsteknik, Datorteknik

Pedro Petersen Moura Trancoso

Chalmers, Data- och informationsteknik, Datorteknik

Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors

10636404 (ISSN)

368-376
9781665461863 (ISBN)

2022 IEEE 40th International Conference on Computer Design (ICCD)
Lake Tahoe, USA,

Very Efficient Deep Learning in IOT (VEDLIoT)

Europeiska kommissionen (EU) (EC/H2020/957197), 2020-11-01 -- 2023-10-31.

European, extendable, energy-efficient, energetic, embedded, extensible, Processor Ecosystem (eProcessor)

Europeiska kommissionen (EU) (EC/H2020/956702), 2021-01-01 -- 2024-06-30.

Ämneskategorier

Datorteknik

Datavetenskap (datalogi)

Datorsystem

Drivkrafter

Hållbar utveckling

DOI

10.1109/ICCD56317.2022.00061

Mer information

Senast uppdaterat

2023-03-21