VSA: A Hybrid Vector-Systolic Architecture
Paper in proceeding, 2022

In order to deliver high performance efficiently, modern processors include dedicated hardware to accelerate different application domains. For example, several recent processors include dedicated Machine Learning (ML) accelerators. However, while adding dedicated hardware improves efficiency compared to general-purpose CPUs, it also requires a larger area, making it unfeasible for smaller devices. Therefore, exploring ways to use the existing hardware for different functionalities becomes desirable in those setups. In this work, we explore the reuse of the components in a Vector Processing Unit (VPU) to offer the functionality of a Systolic Array (SA) for General Matrix Multiplication (GEMM), a kernel extensively used in machine learning, big data, and scientific computing. This hybrid Vector-Systolic Architecture (VSA) can thus support Single Instruction Multiple Data (SIMD) instruction extensions with the VPU functionality and efficiently compute GEMM with the SA functionality. We present an implementation of VSA as a RISC-V co-processor that adds minimal hardware overhead of less than 0.1% compared to a baseline RISC-V implementation with a VPU. In our evaluation using different Deep Neural Network (DNN) models, VSA shows a speedup of up to 3.5x and a reduction in energy consumption of up to 70%.

DNN

Vector Unit

Systolic Array

Machine Learning

GEMM

SIMD

Author

Mateo Vázquez Maceiras

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Muhammad Waqar Azhar

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Pedro Petersen Moura Trancoso

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors

10636404 (ISSN)

Vol. 2022-October 368-376
9781665461863 (ISBN)

2022 IEEE 40th International Conference on Computer Design (ICCD)
Lake Tahoe, USA,

Very Efficient Deep Learning in IOT (VEDLIoT)

European Commission (EC) (EC/H2020/957197), 2020-11-01 -- 2023-10-31.

European, extendable, energy-efficient, energetic, embedded, extensible, Processor Ecosystem (eProcessor)

European Commission (EC) (EC/H2020/956702), 2021-01-01 -- 2024-06-30.

Subject Categories

Computer Engineering

Computer Science

Computer Systems

Driving Forces

Sustainable development

DOI

10.1109/ICCD56317.2022.00061

More information

Latest update

7/17/2024