Approximation and Compression Techniques to Enhance Performance of Graphics Processing Units

Alexandra Angerd

Approximation and Compression Techniques to Enhance Performance of Graphics Processing Units
Doctoral thesis, 2020

A key challenge in modern computing systems is to access data fast enough to fully utilize the computing elements in the chip. In Graphics Processing Units (GPUs), the performance is often constrained by register file size, memory bandwidth, and the capacity of the main memory. One important technique towards alleviating this challenge is data compression. By reducing the amount of data that needs to be communicated or stored, memory resources crucial for performance can be efficiently utilized.

This thesis provides a set of approximation and compression techniques for GPUs, with the goal of efficiently utilizing the computational fabric, and thereby increase performance. The thesis shows that these techniques can substantially lower the amount of information the system has to process, and are thus important tools in the process of meeting challenges in memory utilization.

This thesis makes contributions within three areas: controlled floating-point precision reduction, lossless and lossy memory compression, and distributed training of neural networks. In the first area, the thesis shows that through automated and controlled floating-point approximation, the register file can be more efficiently utilized. This is achieved through a framework which establishes a cross-layer connection between the application and the microarchitecture layer, and a novel register file organization capable of leveraging low-precision floating-point values and narrow integers for increased capacity and performance.

Within the area of compression, this thesis aims at increasing the effective bandwidth of GPUs by presenting a lossless and lossy memory compression algorithm to reduce the amount of transferred data. In contrast to state-of-the-art compression techniques such as Base-Delta-Immediate and Bitplane Compression, which uses intra-block bases for compression, the proposed algorithm leverages multiple global base values to reach a higher compression ratio. The algorithm includes an optional approximation step for floating-point values which offers higher compression ratio at a given, low, error rate.

Finally, within the area of distributed training of neural networks, this thesis proposes a subgraph approximation scheme for graph data which mitigates accuracy loss in a distributed setting. The scheme allows neural network models that use graphs as inputs to converge at single-machine accuracy, while minimizing synchronization overhead between the machines.

Compression

Approximate Computing

Machine Learning

Floating-Point Precision

Microarchitecture

GPU

Zoom (password request: erik.sintorn@chalmers.se)

Opponent: Prof. Natalie Enright Jerger, University of Toronto, ON, Canada

Online defence

Author

Alexandra Angerd

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Other publications Research

A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs

Transactions on Architecture and Code Optimization,;Vol. 14(2017)

Journal article

A GPU Register File using Static Data Compression

ACM International Conference Proceeding Series,;(2020)

Paper in proceeding

GBDI: Going Beyond Base-Delta-Immediate Compression with Global Bases

Proceedings - International Symposium on High-Performance Computer Architecture,;Vol. 2022-April(2022)p. 1115-1127

Paper in proceeding

A. Angerd, K. Balasubramanian, M. Annavaram. Distributed Training of Graph Convolutional Networks using Subgraph Approximation

A precondition for computers to execute as fast as possible is that they are supplied with enough data. The data are stored in memory structures at different levels of the computer hierarchy. However, accessing these memory structures is often slower than the computations themselves. This results in a bottleneck, where calculations are stalled because not enough data are available.

One important technique towards alleviating this challenge is data compression, in which the information is encoded in a more compact format. Compression can be either lossless or lossy. When using a lossless compression technique, it is possible to reconstruct the original data without any loss of information. In contrast, lossy techniques encode the data by leaving out less important information. This is achieved through approximation of the original data.

The thesis proposes a set of approximation and compression techniques for Graphics Processing Units (GPUs), which help them to access data faster. The thesis shows that these techniques can increase the performance of GPUs.

One new insight made in this thesis is that controlled approximation can increase performance while still delivering high-quality results. Controlled approximation means that the quality of the output is guaranteed to stay above a certain pre-defined quality threshold. This indicates that approximations can be used to increase performance in a wide range of applications.

ACE: Approximate Algorithms and Computing Systems

Swedish Research Council (VR) (2014-6221), 2015-01-01 -- 2018-12-31.

Show Project

Subject Categories (SSIF 2011)

Computer Engineering

Computer Systems

Areas of Advance

Information and Communication Technology

ISBN

978-91-7905-425-0

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4892

Technical report D - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University: 192D

Publisher

Chalmers