Dependable Distributed Training of Compressed Machine Learning Models

Francesco Malandrino; Giuseppe Di Giacomo; Marco Levorato; Carla Fabiana Chiasserini

doi:10.1109/WoWMoM60985.2024.00036

Dependable Distributed Training of Compressed Machine Learning Models
Paper i proceeding, 2024

The existing work on the distributed training of machine learning (ML) models has consistently overlooked the distribution of the achieved learning quality, focusing instead on its average value. This leads to a poor dependability of the resulting ML models, whose performance may be much worse than expected. We fill this gap by proposing DepL, a framework for dependable learning orchestration, able to make high-quality, efficient decisions on (i) the data to leverage for learning, (ii) the models to use and when to switch among them, and (iii) the clusters of nodes, and the resources thereof, to exploit. For concreteness, we consider as possible available models a full DNN and its compressed versions. Unlike previous studies, DepL guarantees that a target learning quality is reached with a target probability, while keeping the training cost at a minimum. We prove that DepL has constant competitive ratio and polynomial complexity, and show that it outperforms the state-of-the-art by over 27% and closely matches the optimum.

dependable learning

learning guarantees

network support to machine learning

Distributed learning

Författare

Francesco Malandrino

Consiglo Nazionale Delle Richerche

Consorzio Nazionale Interuniversitario per le Telecomunicazioni (CNIT)

Giuseppe Di Giacomo

Politecnico di Torino

Marco Levorato

University of California

Carla Fabiana Chiasserini

Politecnico di Torino

Consorzio Nazionale Interuniversitario per le Telecomunicazioni (CNIT)

Consiglo Nazionale Delle Richerche

Nätverk och System

Forskning Andra publikationer

Proceedings - 2024 IEEE 25th International Symposium on a World of Wireless, Mobile and Multimedia Networks, WoWMoM 2024

147-156
9798350394665 (ISBN)

25th IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks, WoWMoM 2024
Perth, Australia,

Ämneskategorier (SSIF 2011)

Datavetenskap (datalogi)

Datorsystem

DOI

10.1109/WoWMoM60985.2024.00036

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2025-06-23

Dependable Distributed Training of Compressed Machine Learning Models Paper i proceeding, 2024