A Cache-centric Execution Model and Runtime for Deep Parallel Multicore Topologies
Paper i proceeding, 2016

Computational task DAGs are executed on parallel computers by a task scheduling algorithm. Intelligent scheduling is critical for achieving high parallelism, low overheads and reduced communication. A key technique for load balancing task DAGs is work stealing (WS), which Blumofe et al. popularized for fork-join computations [2]. In scenarios of high parallel slackness, WS's distributed nature allows it to scale to a large number of cores with low overhead [4]. However, the space of a WS computation grows proportionally to the number of cores. Targeting a lower bound, Blelloch et al. proposed the parallel-depth-first (PDF) scheduler [1]. PDF schedules tasks by following the depth-first (serial) order of computation and has space requirements closer to the serial execution. PDF has been shown to provide constructive cache sharing in modern multicore architectures [3]. However, implementing PDF requires a centralized scheduler which limits scalability. Targeting NUMA architectures, Olivier et al. proposed to load balance multiple PDF schedulers via WS [8]. While enabling scalability to larger systems, such approach still suffers from centralized scheduling of fine-grained parallelism [9]. Furthermore, for applications in which the amount of parallelism varies greatly, a fixed hierarchy of PDF queues is not enough.

task scheduling

constructive cache sharing

resource management



Miquel Pericas

Chalmers, Data- och informationsteknik, Datorteknik

Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

1089795X (ISSN)

Vol. 2016 429-431

25th International Conference on Parallel Architectures and Compilation Techniques, PACT 2016
Haifa, Israel,





Mer information

Senast uppdaterat