Adaptive Resource Management Techniques for High Performance Multi-Core Architectures
Cache partitioning in multi-core architectures is challenging due to the need to determine cache allocations with low computational overhead and the need to place the partitions in a locality-aware manner. The requirement for low computational overhead is important in order to have the capability to scale to large core counts. Previous work within multi-resource management has proposed coordinately managing a subset of the techniques: cache partitioning, bandwidth partitioning and prefetch throttling. However, coordinated management of all three techniques opens up new possible trade-offs and interactions which can be leveraged to gain better performance.
This thesis contributes with two different resource management techniques: One resource manger for scalable cache partitioning and a multi-resource management technique for coordinated management of cache partitioning, bandwidth partitioning and prefetching. The scalable resource management technique for cache partitioning uses a distributed and asynchronous cache partitioning algorithm that works together with a flexible NUCA enforcement mechanism in order to give locality-aware placement of data and support fine-grained partitions. The algorithm adapts quickly to application phase changes. The distributed nature of the algorithm together with the low computational complexity, enables the solution to be implemented in hardware and scale to large core counts. The multi-resource management technique for coordinated management of cache partitioning bandwidth partitioning and prefetching is designed using the results from our in-depth characterisation from the entire SPEC CPU2006 suite. The solution consists of three local resource management techniques that together with a coordination mechanism provides allocations which takes the inter-resource interactions and trade-offs into account.
Our evaluation shows that the distributed cache partitioning solution performs within 1% from the best known centralized solution, which cannot scale to large core counts. The solution improves performance by 9% and 16%, on average, on a 16 and 64-core multi-core architecture, respectively, compared to a shared last-level cache. The multi-resource management technique gives a performance increase of 11%, on average, over state-of-the-art and improves performance by 50% compared to the baseline 16-core multi-core without cache partitioning, bandwidth partitioning and prefetch throttling.
Chalmers, Data- och informationsteknik, Datorteknik
DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors
Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020,; (2020)p. 578-589
Paper i proceeding
Nadja Ramhöj Holtryd, Madhavan Manivannan, Per Stenström, Miquel Pericàs. CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling
Low-energy toolset for heterogeneous computing (LEGaTO)
Europeiska kommissionen (EU) (EC/H2020/780681), 2018-02-01 -- 2021-01-31.
Meeting Challenges in Computer Architecture (MECCA)
Europeiska kommissionen (EU) (EC/FP7/340328), 2014-02-01 -- 2019-01-31.
ACE: Approximativa algoritmer och datorsystem
Vetenskapsrådet (VR) (2014-6221), 2015-01-01 -- 2018-12-31.
CSE EDIT 8103
Opponent: Professor Ramon Canal, UPC