Shared memory parallel algorithms are extremely tricky; small details can introduce hard to discover bugs, depending on the interleaving of events. Programmers naturally wish to rely on and implement known methods for synchronization from the literature. However, new systems do not provide the memory consistency commonly assumed by the constructions in the literature, but rather more relaxed ones, which, roughly speaking, allow reorderings of operations towards aggressive optimization that hides memory latencies, thus implying larger difficulty in reasoning about the order of events.
Programmers face confusion and dillemas: shall one sequentialize all accesses to shared data with barriers (thus "killing" the performance properties of efficient, fine-grain synchronization algorithms that aim at enhancing parallelism) or struggle to come up with a solution of the level of difficulty of a publishable result for each program?
We aim at balancing the trade-off between the cost of synchronization and ease of programming. Towards that, we will work to provide programmers with needed clever ways to avoid heavy use of unnecessary global synchronization constructs (barriers) when possible and with crieria and methods to use alternative efficient primitives and constructs when necessary, in programming algorithms available in the literature.
Docent at Computer Science and Engineering, Networks and Systems (Chalmers)
Funding years 2011–2013