Efficient Forwarding of Producer-Consumer Data in Task-based Programs
Task-based programming models are increasingly being adopted due to their ability to express parallelism. They
also lead to higher programmer productivity by delegating to
the run-time system and the architecture demanding parallelism management tasks such as scheduling and staging of the communication between tasks.
This paper focuses on techniques to optimize producer-consumer sharing in task-based programs. As the set of producer and consumer tasks can often be statically determined, coherence prediction techniques are expected to successfully optimize producer-consumer sharing. We show that they are ineffective because the mapping of tasks to cores changes based on runtime conditions. The paper contributes with a technique that forwards produced and spatially close blocks to the consumer in a single transaction when a consumer requests a first block. We also find that stride prefetching is competitive with our forwarding technique for sufficiently coarse tasks. However, its effectiveness deteriorates as the task granularity is reduced because of limited opportunities to train for the access pattern and to issue prefetches sufficiently ahead of time. This makes our forwarding scheme a robust alternative to reduce communication
overhead in task-based programs.