Global dead-block management for task-parallel programs
Journal article, 2018
Task-parallel programs inefficiently utilize the cache hierarchy due to the presence of dead blocks in caches. Dead blocks may occupy cache space in multiple cache levels for a long time without providing any utility until they are finally evicted. Existing dead-block prediction schemes take decisions locally for each cache level and do not efficiently manage the entire cache hierarchy. This article introduces runtime-orchestrated global dead-block management, in which static and dynamic information about tasks available to the runtime system is used to effectively detect and manage dead blocks across the cache hierarchy. In the proposed global management schemes, static information (e.g., when tasks start/finish, and what data regions tasks produce/consume) is combined with dynamic information to detect when/where blocks become dead. When memory regions are deemed dead at some cache level(s), all the associated cache blocks are evicted from the corresponding level(s). We extend the cache controllers at both private and shared cache levels to use the aforementioned information to evict dead blocks. The article does an extensive evaluation of both inclusive and non-inclusive cache hierarchies and shows that the proposed global schemes outperform existing local dead-block management schemes.
Task parallelism
Cache hierarchy
Multi-core architecture
Dead blocks
Runtime system