Scalable and Locality-aware Resource Management with Task Assembly Objects
Paper i proceeding, 2015
Efficiently scheduling application concurrency to system level resources is one of the main challenges in parallel computing. Current approaches based on mapping single-threaded tasks to individual cores via worksharing or random work stealing suffer from bottlenecks such as idleness, work time inflation and/or scheduling overheads.
This paper proposes an execution model called Task Assembly
Objects (TAO) that targets scalability and communication avoidance on future shared-memory architectures. The main idea behind TAO is to map coarse work units (i.e., task DAG partitions) to coarse hardware (i.e., system topology partitions) via a new construct called a task assembly: a nested parallel computation that aggregates fine-grained tasks and cores, and is managed by a private scheduler. By leveraging task assemblies via two-level global-private
scheduling, TAO simplifies resource management and exploits
multiple levels of locality. To test the TAO model, we present a software prototype called go:TAO and evaluate it with two benchmarks designed to stress load balancing and data locality. Our initial experiments give encouraging results for achieving scalability and communication-avoidance in future multi-core environments.