Non-blocking programming on multi-core graphics processors
Journal article, 2009

This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures like the CUDA graphics processors. We first design three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models in terms of their consensus numbers. These tight results show that the coalesced memory access mechanisms can facilitate strong synchronization between the threads of multicore processors, without the need of synchronization primitives other than reads and writes. Moreover, based on the intrinsic features of recent GPU architectures, we construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-and-set and compare-and-swap. Accesses to the wait-free objects have time complexity O(N), where N is the number of processes. Our result demonstrates that it is possible to construct waitfree synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs.

wait-free programming

GPU architectures

Author

Phuong Ha

Philippas Tsigas

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Otto Anshus

SIGARCH Computer Architecture News

0163-5964 (ISSN)

Vol. 36 5 19-28

Subject Categories (SSIF 2011)

Computer Engineering

Software Engineering

Computer Science

More information

Created

10/7/2017