Dělen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI

Qianlin Liang; Walid A. Hanafy; Noman Bashir; Ahmed Ali-Eldin Hassan; David Irwin; Prashant Shenoy

doi:10.1145/3576842.3582375

Dělen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI
Paper i proceeding, 2023

Model-serving systems expose machine learning (ML) models to applications programmatically via a high-level API. Cloud platforms use these systems to mask the complexities of optimally managing resources and servicing inference requests across multiple applications. Model serving at the edge is now also becoming increasingly important to support inference workloads with tight latency requirements. However, edge model serving differs substantially from cloud model serving in its latency, energy, and accuracy constraints: these systems must support multiple applications with widely different latency and accuracy requirements on embedded edge accelerators with limited computational and energy resources. To address the problem, this paper presents Dělen,1 a flexible and adaptive model-serving system for multi-tenant edge AI. Dělen exposes a high-level API that enables individual edge applications to specify a bound at runtime on the latency, accuracy, or energy of their inference requests. We efficiently implement Dělen using conditional execution in multi-exit deep neural networks (DNNs), which enables granular control over inference requests, and evaluate it on a resource-constrained Jetson Nano edge accelerator. We evaluate Dělen flexibility by implementing state-of-the-art adaptation policies using Dělen's API, and evaluate its adaptability under different workload dynamics and goals when running single and multiple applications.

Författare

Qianlin Liang

University of Massachusetts

Walid A. Hanafy

University of Massachusetts

Noman Bashir

University of Massachusetts

Ahmed Ali-Eldin Hassan

Nätverk och System

Forskning Andra publikationer

David Irwin

University of Massachusetts

Prashant Shenoy

University of Massachusetts

ACM International Conference Proceeding Series

209-221
9798400700378 (ISBN)

8th ACM/IEEE Conference on Internet of Things Design and Implementation, IoTDI 2023
San Antonio, USA,

Ämneskategorier (SSIF 2011)

Inbäddad systemteknik

Datavetenskap (datalogi)

Datorsystem

DOI

10.1145/3576842.3582375

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2023-06-19

Dělen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI Paper i proceeding, 2023

Författare

Qianlin Liang

Walid A. Hanafy

Noman Bashir

Ahmed Ali-Eldin Hassan

David Irwin

Prashant Shenoy

ACM International Conference Proceeding Series

Ämneskategorier (SSIF 2011)

DOI

Mer information

Senast uppdaterat

Dělen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI
Paper i proceeding, 2023