Modeling the performance of atomic primitives on modern architectures
Paper in proceeding, 2019

Utilizing the atomic primitives of a processor to access a memory location atomically is key to the correctness and feasibility of parallel software systems. The performance of atomics plays a significant role in the scalability and overall performance of parallel software systems. In this work, we study the performance -in terms of latency, throughput, fairness, energy consumption- of atomic primitives in the context of the two common software execution settings that result in high and low contention access on shared memory. We perform and present an exhaustive study of the performance of atomics in these two application contexts and propose a performance model that captures their behavior. We consider two state-of-the-art architectures: Intel Xeon E5, Xeon Phi (KNL). We propose a model that is centered around the bouncing of cache lines between threads that execute atomic primitives on these shared cache lines. The model is very simple to be used in practice and captures the behavior of atomics accurately under these execution scenarios and facilitate algorithmic design decisions in multi-threaded programming.

Modeling

Performance

Synchronization

Concurrency

Atomic Primitives

Parallel Computing

Author

Fazeleh Sadat Hoseini

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Aras Atalar

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Philippas Tsigas

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

ACM International Conference Proceeding Series

a28
978-1-4503-6295-5 (ISBN)

48th International Conference on Parallel Processing, ICPP 2019
Kyoto, Japan,

Subject Categories (SSIF 2011)

Computer Engineering

Computer Science

Computer Systems

DOI

10.1145/3337821.3337901

More information

Latest update

1/3/2024 9