On Dynamic Load Balancing on Graphics Processors
Paper i proceeding, 2008
To get maximum performance on the many-core graphics processors
it is important to have an even balance of the workload so that
all processing units contribute equally to the task at hand.
This can be hard to achieve when the cost of a task is not
known beforehand and when new sub-tasks are created dynamically
during execution. With the recent advent of scatter operations
and atomic hardware primitives it is now possible to bring some
of the more elaborate dynamic load balancing schemes from the
conventional SMP systems domain to the graphics processor
domain.
We have compared four different dynamic load balancing methods
to see which one is most suited to the highly parallel world of
graphics processors. Three of these methods were lock-free and
one was lock-based. We evaluated them on the task of creating
an octree partitioning of a set of particles. The experiments
showed that synchronization can be very expensive and that new
methods that take more advantage of the graphics processors
features and capabilities might be required. They also showed
that lock-free methods achieves better performance than
blocking and that they can be made to scale with increased
numbers of processing units.
load balancing
graphics processors
gpgpu
dynamic datastructures
gpu
lock-free