Reducing Memory Traffic with Approximate Compression
Memory bandwidth is a critical resource in modern systems and has an increasing demand. The large number of on-chip cores and specialized accelerators improves the potential processing throughput but also calls for higher data rates. In addition, new emerging data-intensive applications further increase memory traffic. On the other hand, memory bandwidth is pin limited and power constrained and is therefore more difficult to scale. This thesis proposes lossy memory compression as a means to reduce data volumes by exploiting the ability of applications to tolerate approximations in parts of their datasets. A reduction of off-chip memory traffic leads to reduced memory latency, which in turn improves the performance and energy efficiency of the system. The first part of this thesis introduces Approximate Value Reconstruction (AVR), which combines a low-complexity downsampling compressor with an LLC design able to co-locate compressed and uncompressed data. Two separate thresholds are employed to limit the error introduced by approximation. For applications that tolerate aggressive approximation in large fractions of their data, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing up to 1.2% error to the application output. The second part of this thesis proposes Memory Squeeze (MemSZ), introducing a parallelized implementation of the more advanced Squeeze (SZ) compression method. Furthermore, MemSZ improves on the error limiting capability of AVR by keeping track of life-time accumulated error. An alternate memory compression architecture is also proposed, which utilizes 3D-stacked DRAM as a last-level cache. MemSZ improves execution time, energy and memory traffic over AVR by up to 15%, 9%, and 64%, respectively.