Consistent lock-free parallel stochastic gradient descent for fast and stable convergence

Karl Bäckström; Ivan Walulya; Marina Papatriantafilou; Philippas Tsigas

doi:10.1109/IPDPS49936.2021.00051

Consistent lock-free parallel stochastic gradient descent for fast and stable convergence
Paper i proceeding, 2021

Stochastic Gradient Descent (SGD) is an essential element in Machine Learning (ML) algorithms. Asynchronous shared-memory parallel SGD (AsyncSGD), including synchronization-free algorithms, e.g. HOGWILD!, have received interest in certain contexts, due to reduced overhead compared to synchronous parallelization. Despite that they induce staleness and inconsistency, they have shown speedup for problems satisfying smooth, strongly convex targets, and gradient sparsity. Recent works take important steps towards understanding the potential of parallel SGD for problems not conforming to these strong assumptions, in particular for deep learning (DL). There is however a gap in current literature in understanding when AsyncSGD algorithms are useful in practice, and in particular how mechanisms for synchronization and consistency play a role. We contribute with answering questions in this gap by studying a spectrum of parallel algorithmic implementations ofAsyncSGD, aiming to understand how shared-data synchronization influences the convergence properties in fundamental DL applications. We focus on the impact of consistency-preserving non-blocking synchronization in SGD convergence, and in sensitivity to hyper-parameter tuning. We propose Leashed-SGD, an extensible algorithmic framework of consistency-preserving implementations of AsyncSGD, employing lock-free synchronization, effectively balancing throughput and latency. Leashed-SGD features a natural contention-regulating mechanism, as well as dynamic memory management, allocating space only when needed. We argue analytically about the dynamics of the algorithms, memory consumption, the threads' progress over time, and the expected contention. We provide a comprehensive empirical evaluation, validating the analytical claims, benchmarking the proposed Leashed-SGD framework, and comparing to baselines for two prominent deep learning (DL) applications: multilayer perceptrons (MLP) and convolutional neural networks (CNN). We observe the crucial impact of contention, staleness and consistency and show how, thanks to the aforementioned properties, Leashed-SGD provides significant improvements in stability as well as wall-clock time to convergence (from 20-80% up to 4 ×improvements) compared to the standard lock-based AsyncSGD algorithm and HOGWILD!, while reducing the overall memory footprint.

Parallel algorithms

Lock-free synchronization

Artificial neural networks

Stochastic gradient descent

Författare

Karl Bäckström

Nätverk och System

Forskning Andra publikationer

Ivan Walulya

Chalmers, Data- och informationsteknik, Nätverk och system

Forskning Andra publikationer

Marina Papatriantafilou

Nätverk och System

Forskning Andra publikationer

Philippas Tsigas

Nätverk och System

Forskning Andra publikationer

Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021

423-432 9460457
9781665440660 (ISBN)

35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021
Virtual, Online, ,

WASP SAS

Wallenberg AI, Autonomous Systems and Software Program, 2018-01-01 -- 2023-01-01.

Visa projekt

Styrkeområden

Informations- och kommunikationsteknik

Energi

Ämneskategorier (SSIF 2011)

Datavetenskap (datalogi)

Datorsystem

DOI

10.1109/IPDPS49936.2021.00051

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2022-01-17

Consistent lock-free parallel stochastic gradient descent for fast and stable convergence Paper i proceeding, 2021

Författare

Karl Bäckström

Ivan Walulya

Marina Papatriantafilou

Philippas Tsigas

Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021

WASP SAS

Styrkeområden

Ämneskategorier (SSIF 2011)

DOI

Mer information

Senast uppdaterat

Consistent lock-free parallel stochastic gradient descent for fast and stable convergence
Paper i proceeding, 2021