Extending Vector Processing Units for Enhanced Linear Algebra Performance
Licentiatavhandling, 2024
A relevant set of kernels widely used nowadays are linear algebra kernels. These kernels have been used in multiple domains for decades. However, they are at the core of Machine Learning (ML) applications, which is one of the domains with the fastest requirement increase, both in terms of performance and energy. Consequently, there is a high interest in computing these kernels faster and more efficiently. VPUs are a good mapping for these kernels but they do not offer the same performance and efficiency as custom accelerators.
This Thesis presents two different extensions for enhancing linear algebra kernels in VPUs. The first extension enhances VPUs with the functionality of Systolic Arrays (SAs) for more efficient computation of General Matrix-Matrix Multiplication (GEMM). This enhancement is done by remapping the functional units of the VPU from a 1D to a 2D array. In addition, this Thesis also analyzes the implications of this new SA-like functionality, proposing corresponding new memory instructions and an analysis to dynamically select the functionality that maximizes resource utilization. The second extension proposes a memory extension that provides VPUs with index-matching functionalities for sparse linear algebra operations. This extension transforms the index-matching problem into one of hash lookup, and implements this problem in hardware using cache-like techniques. These extensions achieve up to 4.22x and 3.19x speedup respectively.
Vector
Sparse
Dense
Linear Algebra
SIMD
ISA extension
Författare
Mateo Vázquez Maceiras
Chalmers, Data- och informationsteknik, Datorteknik
VSA: A Hybrid Vector-Systolic Architecture
Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors,;Vol. 2022-October(2022)p. 368-376
Paper i proceeding
Exploiting the Potential of Flexible Processing Units
Proceedings - Symposium on Computer Architecture and High Performance Computing,;(2023)p. 34-45
Paper i proceeding
Mateo Vázquez, Muhammad Waqar Azhar, Mohammad Ali Maleki, Pedro Petersen Moura Trancoso. Scalable Hardware Hash for Index-Matching in Vector Architectures.
Very Efficient Deep Learning in IOT (VEDLIoT)
Europeiska kommissionen (EU) (EC/H2020/957197), 2020-11-01 -- 2023-10-31.
European, extendable, energy-efficient, energetic, embedded, extensible, Processor Ecosystem (eProcessor)
Europeiska kommissionen (EU) (EC/H2020/956702), 2021-01-01 -- 2024-06-30.
Styrkeområden
Informations- och kommunikationsteknik
Ämneskategorier
Datavetenskap (datalogi)
Utgivare
Chalmers
Room EE, EDIT Building, Chalmers
Opponent: Antonio González, Universitat Politècnica de Catalunya, Spain