Design Considerations of Value-aware Caches
On-chip cache memories are instrumental in tackling several performance and energy issues facing contemporary and future microprocessor chip architectures. First, they are key to bridge the growing speed-gap between memory and processors. Second, as the bandwidth into the chip is not keeping up the pace with the growth in processing performance, on-chip caches are essential in keeping the bandwidth demands within the limits. Finally, since off-chip memory accesses consume a substantial amount of energy, larger on-chip caches can potentially bring down energy wastage for off-chip accesses. Hence, techniques to improve on-chip cache utilization are important.
This thesis shows that value replication -- the same value is replicated in multiple memory locations -- is an important source to improve utilization of cache/memory capacity. The thesis establishes through experimentation that many applications exhibit a high value locality and when it is exploited by storing each unique memory value exactly once, compression factors beyond 16X can be achieved. The proposed cache compression techniques build on this opportunity by encoding replicated values. While cache compression techniques in the past manage to code frequent values densely, they trade off a high compression ratio for low decompression latency, thus missing opportunities to utilize on-chip cache capacity more effectively.
The thesis further analyses design considerations when realising a practical value-aware cache that accommodates statistical-based compression and presents, for the first time, a detailed design-space exploration of statistical-based cache compression. It is shown that more aggressive, statistical-based compression approaches, such as Huffman coding, that have been excluded in the past due to the processing overhead for compression and decompression, are prime candidates for cache and memory compression.
In this thesis, I find that, even though more processing-intensive decompression affects the cache-hit time of last-level caches, modern out-of-order cores can typically hide the decompression latency successfully. Moreover, the impact of statistics acquisition to generate new codewords is also low because value locality varies little over time so new encodings need to be generated rarely making it possible to off-load it to software routines. Interestingly, the high compression ratio obtained by statistical-based cache compression is shown to improve cache capacity by close to three times which for cache-intensive workloads results in significant performance gains (20% on average) and substantial energy savings (the saved energy may be even 10 times larger than the total energy overheads) by reducing the off-chip use.