Lossy and Lossless Compression Techniques to Improve the Utilization of Memory Bandwidth and Capacity
Doctoral thesis, 2022
This thesis proposes a variety of memory compression techniques as a means to reduce the memory bottleneck. These techniques target two separate problems in the memory hierarchy: memory bandwidth and memory capacity. In order to reduce transferred data volumes, lossy compression is applied which is able to reach more aggressive compression ratios. A reduction of off-chip memory traffic leads to reduced memory latency, which in turn improves the performance and energy efficiency of the system. To improve memory capacity, a novel approach to memory compaction is presented.
The first part of this thesis introduces Approximate Value Reconstruction (AVR), which combines a low-complexity downsampling compressor with an LLC design able to co-locate compressed and uncompressed data. Two separate thresholds limit the error introduced by approximation. For applications that tolerate aggressive approximation in large fractions of their data, in a system with 1GB of 1600MHz DDR4 per core and 1MB of LLC space per core, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing at most 1.2% error in the application output.
The second part of this thesis proposes Memory Squeeze (MemSZ), introducing a parallelized implementation of the more advanced Squeeze (SZ) compression method. Furthermore, MemSZ improves on the error limiting capability of AVR by keeping track of life-time accumulated error. An alternate memory compression architecture is also proposed, which utilizes 3D-stacked DRAM as a last-level cache. In a system with 1GB of 800MHz DDR4 per core and 1MB of LLC space per core, MemSZ improves execution time, energy and memory traffic over AVR by up to 15%, 9%, and 64%, respectively.
The third part of the thesis describes L2C, a hybrid lossy and lossless memory compression scheme. L2C applies lossy compression to approximable data, and falls back to lossless if an error threshold is exceeded. In a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, L2C improves on the performance of MemSZ by 9%, and energy consumption by 3%.
The fourth and final contribution is FlatPack, a novel memory compaction scheme. FlatPack is able to reduce the traffic overhead compared to other memory compaction systems, thus retaining the bandwidth benefits of compression. Furthermore, FlatPack is flexible to changes in block compressibility both over time and between adjacent blocks. When available memory corresponds to 50% of the application footprint, in a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, FlatPack increases system performance compared to current state-of-the-art designs by 36%, while reducing system energy consumption by 12%.
Approximate Computing
Compression
Memory Systems
Author
Albin Eldstål-Ahrens
Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)
AVR: Reducing Memory Traffic with Approximate Value Reconstruction
ACM International Conference Proceeding Series,;Vol. 5 August 2019(2019)
Paper in proceeding
MemSZ: Squeezing Memory Traffic with Lossy Compression
Transactions on Architecture and Code Optimization,;Vol. 17(2020)
Journal article
L2C: Combining Lossy and Lossless Compression on Memory and I/O
Transactions on Embedded Computing Systems,;Vol. 21(2022)
Journal article
FlatPack: Flexible Compaction of Compressed Memory
Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT,;(2022)p. 96-108
Paper in proceeding
Some classes of computer applications such as multimedia, economic forecasting, scientific simulations, and statistical processing are able to tolerate small approximations during execution. This tolerance to gradual errors can be exploited to increase performance, a field of study known as Approximate Computing. One example of this is lossy compression, which aggressively reduces the size of data by allowing small inaccuracies.
This thesis proposes a number of lossy compression techniques designed to more efficiently utilize the existing memory bandwidth, as well as memory compaction to increase the effective capacity of main memory. By applying these techniques, system performance can be increased and energy consumption reduced, without adding more system memory. A set of error-limiting mechanisms are introduced, which ensure that the lossy compression does not cause unacceptable impact on the system’s output data.
ACE: Approximate Algorithms and Computing Systems
Swedish Research Council (VR) (2014-6221), 2015-01-01 -- 2018-12-31.
Areas of Advance
Information and Communication Technology
Infrastructure
C3SE (Chalmers Centre for Computational Science and Engineering)
Subject Categories (SSIF 2011)
Computer Systems
ISBN
978-91-7905-607-0
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5073
Publisher
Chalmers