Lossy and Lossless Compression Techniques to Improve the Utilization of Memory Bandwidth and Capacity
Doctoral thesis, 2022

Main memory is a critical resource in modern computer systems and is in increasing demand. An increasing number of on-chip cores and specialized accelerators improves the potential processing throughput but also calls for higher data rates and greater memory capacity. In addition, new emerging data-intensive applications further increase memory traffic and footprint. On the other hand, memory bandwidth is pin limited and power constrained and is therefore more difficult to scale. Memory capacity is limited by cost and energy considerations.

This thesis proposes a variety of memory compression techniques as a means to reduce the memory bottleneck. These techniques target two separate problems in the memory hierarchy: memory bandwidth and memory capacity. In order to reduce transferred data volumes, lossy compression is applied which is able to reach more aggressive compression ratios. A reduction of off-chip memory traffic leads to reduced memory latency, which in turn improves the performance and energy efficiency of the system. To improve memory capacity, a novel approach to memory compaction is presented.

The first part of this thesis introduces Approximate Value Reconstruction (AVR), which combines a low-complexity downsampling compressor with an LLC design able to co-locate compressed and uncompressed data. Two separate thresholds limit the error introduced by approximation. For applications that tolerate aggressive approximation in large fractions of their data, in a system with 1GB of 1600MHz DDR4 per core and 1MB of LLC space per core, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing at most 1.2% error in the application output.

The second part of this thesis proposes Memory Squeeze (MemSZ), introducing a parallelized implementation of the more advanced Squeeze (SZ) compression method. Furthermore, MemSZ improves on the error limiting capability of AVR by keeping track of life-time accumulated error. An alternate memory compression architecture is also proposed, which utilizes 3D-stacked DRAM as a last-level cache. In a system with 1GB of 800MHz DDR4 per core and 1MB of LLC space per core, MemSZ improves execution time, energy and memory traffic over AVR by up to 15%, 9%, and 64%, respectively.

The third part of the thesis describes L2C, a hybrid lossy and lossless memory compression scheme. L2C applies lossy compression to approximable data, and falls back to lossless if an error threshold is exceeded. In a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, L2C improves on the performance of MemSZ by 9%, and energy consumption by 3%.

The fourth and final contribution is FlatPack, a novel memory compaction scheme. FlatPack is able to reduce the traffic overhead compared to other memory compaction systems, thus retaining the bandwidth benefits of compression. Furthermore, FlatPack is flexible to changes in block compressibility both over time and between adjacent blocks. When available memory corresponds to 50% of the application footprint, in a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, FlatPack increases system performance compared to current state-of-the-art designs by 36%, while reducing system energy consumption by 12%.

Approximate Computing

Compression

Memory Systems

EDIT 8103
Opponent: Professor Moinuddin K. Qureshi, Georgia Institute of Technology

Author

Albin Eldstål-Ahrens

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

AVR: Reducing Memory Traffic with Approximate Value Reconstruction

ACM International Conference Proceeding Series,;Vol. 5 August 2019(2019)

Paper in proceeding

MemSZ: Squeezing Memory Traffic with Lossy Compression

Transactions on Architecture and Code Optimization,;Vol. 17(2020)

Journal article

L2C: Combining Lossy and Lossless Compression on Memory and I/O

Transactions on Embedded Computing Systems,;Vol. 21(2022)

Journal article

FlatPack: Flexible Compaction of Compressed Memory

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT,;(2022)p. 96-108

Paper in proceeding

Computer systems play an increasingly important role in modern society. As new workloads emerge and data volumes grow, the performance and energy efficiency of these systems becomes more and more important. One crucial bottleneck is the memory subsystem, whose bandwidth and capacity are limited resources. Memory bandwidth limits the throughput between the processor and main memory. Limited main memory storage capacity leads to slow and energy-intensive paging of data to persistent storage such as hard disks and solid-state drives.

Some classes of computer applications such as multimedia, economic forecasting, scientific simulations, and statistical processing are able to tolerate small approximations during execution. This tolerance to gradual errors can be exploited to increase performance, a field of study known as Approximate Computing. One example of this is lossy compression, which aggressively reduces the size of data by allowing small inaccuracies.

This thesis proposes a number of lossy compression techniques designed to more efficiently utilize the existing memory bandwidth, as well as memory compaction to increase the effective capacity of main memory. By applying these techniques, system performance can be increased and energy consumption reduced, without adding more system memory. A set of error-limiting mechanisms are introduced, which ensure that the lossy compression does not cause unacceptable impact on the system’s output data.

ACE: Approximate Algorithms and Computing Systems

Swedish Research Council (VR) (2014-6221), 2015-01-01 -- 2018-12-31.

Areas of Advance

Information and Communication Technology

Infrastructure

C3SE (Chalmers Centre for Computational Science and Engineering)

Subject Categories

Computer Systems

ISBN

978-91-7905-607-0

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5073

Publisher

Chalmers

EDIT 8103

Online

Opponent: Professor Moinuddin K. Qureshi, Georgia Institute of Technology

More information

Latest update

11/8/2023