Statistical Compression Cache Designs
On-chip caches are essential as they bridge the growing speed-gap between off-chip memory and processors. To this end, processing cores are sacrificed for more cache space in the chip's real estate, possibly affecting the cache access time and power dissipation. An alternative to increase the effective cache capacity without enlarging its size is cache compression. However, the compression and decompression processes required add complexity and latency; especially decompression lies in the critical memory access path. Prior work focuses on methods that target lower decompression latency by sacrificing important gains in compressibility. On the other hand, this thesis focuses on cache designs that exploit more advanced compression methods, i.e., statistical compression.
The thesis first contributes with an abstract value-aware cache model, which shows that applications often exhibit value locality, and establishes that ideally, by storing each appeared value exactly once, important compression opportunities open up. Motivated by this, the thesis proposes SC^2, a Huffman-based statistical compression cache design. The thesis tackles the problem of statistics acquisition by building a sampling mechanism in hardware. It finds that value locality is rather stable over long time-periods, hence code generation can be offloaded in software. Then it builds the support for compression and decompression in hardware, deals with practical issues such as cache space management, and finally makes a detailed exploration of statistical compression in the last-level cache.
Unfortunately, this approach cannot be straightforwardly applied to data types that contain semantically well-defined data fields. Among such types, the thesis focuses on the common double-precision floating-point data and explores a different avenue to extract value locality by considering the different fields (sign, exponent and mantissa) in isolation. Contrary to prior observations, it is shown that the mantissa exhibits significant value locality if it is further partitioned. Then a novel statistical compression method, called FP-H, tailored for cache compression is proposed.
Finally, the thesis makes the observation that none of the compressed cache designs, including state of the art, are hitherto always better than others. Hence the thesis establishes HyComp, a practical cache design that adopts hybrid compression for the first time where one out of data-type specific compression methods is selected through heuristics. HyComp offers robust compressibility across applications that manipulate diverse data types, without affecting decompression but only slightly impacting compression latency.