Scalable and Locality-aware Resource Management with Task Assembly Objects
Paper in proceeding, 2015

Efficiently scheduling application concurrency to system level resources is one of the main challenges in parallel computing. Current approaches based on mapping single-threaded tasks to individual cores via worksharing or random work stealing suffer from bottlenecks such as idleness, work time inflation and/or scheduling overheads. This paper proposes an execution model called Task Assembly Objects (TAO) that targets scalability and communication avoidance on future shared-memory architectures. The main idea behind TAO is to map coarse work units (i.e., task DAG partitions) to coarse hardware (i.e., system topology partitions) via a new construct called a task assembly: a nested parallel computation that aggregates fine-grained tasks and cores, and is managed by a private scheduler. By leveraging task assemblies via two-level global-private scheduling, TAO simplifies resource management and exploits multiple levels of locality. To test the TAO model, we present a software prototype called go:TAO and evaluate it with two benchmarks designed to stress load balancing and data locality. Our initial experiments give encouraging results for achieving scalability and communication-avoidance in future multi-core environments.

Author

Miquel Pericas

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Workshop on Runtime Systems for Extreme Scale Programming Models and Architectures (RESPA'15)

Subject Categories (SSIF 2011)

Computer Engineering

Areas of Advance

Information and Communication Technology

More information

Created

10/8/2017