Adaptive Resource Management Techniques for High Performance Multi-Core Architectures
Licentiatavhandling, 2021

Reducing the average memory access time is crucial for improving the performance of applications executing on multi-core architectures. With workload consolidation this becomes increasingly challenging due to shared resource contention. Previous works has proposed techniques for partitioning of shared resources (e.g. cache and bandwidth) and prefetch throttling with the goal of mitigating contention and reducing or hiding average memory access time.

Cache partitioning in multi-core architectures is challenging due to the need to determine cache allocations with low computational overhead and the need to place the partitions in a locality-aware manner. The requirement for low computational overhead is important in order to have the capability to scale to large core counts. Previous work within multi-resource management has proposed coordinately managing a subset of the techniques: cache partitioning, bandwidth partitioning and prefetch throttling. However, coordinated management of all three techniques opens up new possible trade-offs and interactions which can be leveraged to gain better performance.

This thesis contributes with two different resource management techniques: One resource manger for scalable cache partitioning and a multi-resource management technique for coordinated management of cache partitioning, bandwidth partitioning and prefetching. The scalable resource management technique for cache partitioning uses a distributed and asynchronous cache partitioning algorithm that works together with a flexible NUCA enforcement mechanism in order to give locality-aware placement of data and support fine-grained partitions. The algorithm adapts quickly to application phase changes. The distributed nature of the algorithm together with the low computational complexity, enables the solution to be implemented in hardware and scale to large core counts. The multi-resource management technique for coordinated management of cache partitioning bandwidth partitioning and prefetching is designed using the results from our in-depth characterisation from the entire SPEC CPU2006 suite. The solution consists of three local resource management techniques that together with a coordination mechanism provides allocations which takes the inter-resource interactions and trade-offs into account.

Our evaluation shows that the distributed cache partitioning solution performs within 1% from the best known centralized solution, which cannot scale to large core counts. The solution improves performance by 9% and 16%, on average, on a 16 and 64-core multi-core architecture, respectively, compared to a shared last-level cache. The multi-resource management technique gives a performance increase of 11%, on average, over state-of-the-art and improves performance by 50% compared to the baseline 16-core multi-core without cache partitioning, bandwidth partitioning and prefetch throttling.

Cache Partitioning

Resource Management

Bandwidth Partitioning

Performance Isolation

Multi-Core Architectures

Prefetch Throttling

CSE EDIT 8103
Opponent: Professor Ramon Canal, UPC

Författare

Nadja Holtryd

Chalmers, Data- och informationsteknik, Datorteknik, Computer Systems

DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors

Proceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020,; (2020)p. 578-589

Paper i proceeding

Nadja Ramhöj Holtryd, Madhavan Manivannan, Per Stenström, Miquel Pericàs. CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling

Low-energy toolset for heterogeneous computing (LEGaTO)

Europeiska kommissionen (EU), 2018-02-01 -- 2021-01-31.

ACE: Approximativa algoritmer och datorsystem

Vetenskapsrådet (VR), 2015-01-01 -- 2018-12-31.

Meeting Challenges in Computer Architecture (MECCA)

Europeiska kommissionen (EU), 2014-02-01 -- 2019-01-31.

Ämneskategorier

Datorteknik

Datavetenskap (datalogi)

Datorsystem

Utgivare

Chalmers tekniska högskola

CSE EDIT 8103

Online

Opponent: Professor Ramon Canal, UPC

Mer information

Senast uppdaterat

2021-03-04