Scalable and Locality-aware Resource Management with Task Assembly Objects
Paper in proceeding, 2015
Efficiently scheduling application concurrency to system level resources is one of the main challenges in parallel computing. Current approaches based on mapping single-threaded tasks to individual cores via worksharing or random work stealing suffer from bottlenecks such as idleness, work time inflation and/or scheduling overheads.
This paper proposes an execution model called Task Assembly
Objects (TAO) that targets scalability and communication avoidance on future shared-memory architectures. The main idea behind TAO is to map coarse work units (i.e., task DAG partitions) to coarse hardware (i.e., system topology partitions) via a new construct called a task assembly: a nested parallel computation that aggregates fine-grained tasks and cores, and is managed by a private scheduler. By leveraging task assemblies via two-level global-private
scheduling, TAO simplifies resource management and exploits
multiple levels of locality. To test the TAO model, we present a software prototype called go:TAO and evaluate it with two benchmarks designed to stress load balancing and data locality. Our initial experiments give encouraging results for achieving scalability and communication-avoidance in future multi-core environments.