Adaptiveness, Asynchrony, and Resource Efficiency in Parallel Stochastic Gradient Descent
Doktorsavhandling, 2023
Stochastic Gradient Descent (SGD) serves as the backbone of many of the most popular ML methods, including in particular Deep Learning. However, SGD has inherently sequential semantics, and is not trivially parallelizable without imposing strict synchronization, with associated bottlenecks. Asynchronous SGD (AsyncSGD), which relaxes the original semantics, has gained significant interest in recent years due to promising results that show speedup in certain contexts. However, the relaxed semantics that asynchrony entails give rise to fundamental questions regarding AsyncSGD, relating particularly to its stability and convergence rate in practical applications.
This thesis explores vital knowledge gaps of AsyncSGD, and contributes in particular to: Theoretical frameworks – Formalization of several key notions related to the impact of asynchrony on the convergence, guiding future development of AsyncSGD implementations; Analytical results – Asymptotic convergence bounds under realistic assumptions. Moreover, several technical solutions are proposed, targeting in particular: Stability – Reducing the number of non-converging executions and the associated wasted energy; Speedup – Improving convergence time and reliability with instance-based adaptiveness; Elasticity – Resource-efficiency by avoiding over-parallelism, and thereby improving stability and saving computing resources. The proposed methods are evaluated on several standard DL benchmarking applications and compared to relevant baselines from previous literature. Key results include: (i) persistent speedup compared to baselines, (ii) increased stability and reduced risk for non-converging executions, (iii) reduction in the overall memory footprint (up to 17%), as well as the consumed computing resources (up to 67%).
In addition, along with this thesis, an open-source implementation is published, that connects high-level ML operations with asynchronous implementations with fine-grained memory operations, leveraging future research for efficient adaptation of AsyncSGD for practical applications.
Författare
Karl Bäckström
Nätverk och System
MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent
Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019,;(2019)p. 16-25
Paper i proceeding
Consistent lock-free parallel stochastic gradient descent for fast and stable convergence
Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021,;(2021)p. 423-432
Paper i proceeding
ASAP.SGD: Instance-based Adaptiveness to Staleness in Asynchronous SGD
Proceedings of Machine Learning Research,;Vol. PMLR 162(2022)p. 1261-1271
Paper i proceeding
Bäckström, K, Papatriantafilou, M, Tsigas, P. Less is more: Elastic Parallelism Control for Asynchronous SGD
However, as the data grows in volume, variety, and speed, the AI models that make sense of it all become more complex. To handle this, parallelism in AI is standard nowadays, by using multiple computing cores to perform tasks more efficiently. Stochastic Gradient Descent (SGD) is at the heart of many AI applications, including deep learning. However, it has limitations, as it's not easily adaptable to parallelism without causing bottlenecks. To overcome this, researchers have been exploring Asynchronous SGD (AsyncSGD), which offers a more flexible approach.
This thesis investigates challenges and potential of AsyncSGD, focusing on its potential in real-world applications. It proposes several technical solutions to enhance the stability, speed, and efficiency of AsyncSGD. The results show that the proposed solutions not only improve the speed of AsyncSGD but also increase stability and reduce the risk of failed executions. Moreover, the solutions reduce computing resources (up to 67%), making it a more sustainable option for handling the ever-growing data generated by our increasingly connected world.
WASP SAS
Wallenberg AI, Autonomous Systems and Software Program, 2018-01-01 -- 2023-01-01.
Drivkrafter
Hållbar utveckling
Innovation och entreprenörskap
Ämneskategorier
Data- och informationsvetenskap
Fundament
Grundläggande vetenskaper
ISBN
978-91-7905-855-5
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5321
Utgivare
Chalmers