Approximation and Compression Techniques to Enhance Performance of Graphics Processing Units
This thesis provides a set of approximation and compression techniques for GPUs, with the goal of efficiently utilizing the computational fabric, and thereby increase performance. The thesis shows that these techniques can substantially lower the amount of information the system has to process, and are thus important tools in the process of meeting challenges in memory utilization.
This thesis makes contributions within three areas: controlled floating-point precision reduction, lossless and lossy memory compression, and distributed training of neural networks. In the first area, the thesis shows that through automated and controlled floating-point approximation, the register file can be more efficiently utilized. This is achieved through a framework which establishes a cross-layer connection between the application and the microarchitecture layer, and a novel register file organization capable of leveraging low-precision floating-point values and narrow integers for increased capacity and performance.
Within the area of compression, this thesis aims at increasing the effective bandwidth of GPUs by presenting a lossless and lossy memory compression algorithm to reduce the amount of transferred data. In contrast to state-of-the-art compression techniques such as Base-Delta-Immediate and Bitplane Compression, which uses intra-block bases for compression, the proposed algorithm leverages multiple global base values to reach a higher compression ratio. The algorithm includes an optional approximation step for floating-point values which offers higher compression ratio at a given, low, error rate.
Finally, within the area of distributed training of neural networks, this thesis proposes a subgraph approximation scheme for graph data which mitigates accuracy loss in a distributed setting. The scheme allows neural network models that use graphs as inputs to converge at single-machine accuracy, while minimizing synchronization overhead between the machines.
Chalmers, Data- och informationsteknik, Datorteknik, Computer Systems
A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs
Transactions on Architecture and Code Optimization,; Vol. 14(2017)
Artikel i vetenskaplig tidskrift
A GPU Register File using Static Data Compression
ACM International Conference Proceeding Series,; (2020)
Paper i proceeding
A. Angerd, E. Sintorn, P. Stenström. GBDI: Going Beyond Base-Delta-Immediate Compression using Global Bases
A. Angerd, K. Balasubramanian, M. Annavaram. Distributed Training of Graph Convolutional Networks using Subgraph Approximation
One important technique towards alleviating this challenge is data compression, in which the information is encoded in a more compact format. Compression can be either lossless or lossy. When using a lossless compression technique, it is possible to reconstruct the original data without any loss of information. In contrast, lossy techniques encode the data by leaving out less important information. This is achieved through approximation of the original data.
The thesis proposes a set of approximation and compression techniques for Graphics Processing Units (GPUs), which help them to access data faster. The thesis shows that these techniques can increase the performance of GPUs.
One new insight made in this thesis is that controlled approximation can increase performance while still delivering high-quality results. Controlled approximation means that the quality of the output is guaranteed to stay above a certain pre-defined quality threshold. This indicates that approximations can be used to increase performance in a wide range of applications.
ACE: Approximativa algoritmer och datorsystem
Vetenskapsrådet (VR), 2015-01-01 -- 2018-12-31.
Informations- och kommunikationsteknik
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4892
Technical report D - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University: 192D
Chalmers tekniska högskola
CSE EDIT 8103, Rännvägen 6.
Opponent: Prof. Natalie Enright Jerger, University of Toronto, ON, Canada