Distributing Inference Tasks Over Interconnected Systems Through Dynamic DNNs

Chetna Singhal; Yashuo Wu; Francesco Malandrino; Marco Levorato; Carla Fabiana Chiasserini

doi:10.1109/TON.2025.3543848

Distributing Inference Tasks Over Interconnected Systems Through Dynamic DNNs
Artikel i vetenskaplig tidskrift, 2025

An increasing number of mobile applications leverage deep neural networks (DNN) as an essential component to adapt to the operational context at hand and provide users with an enhanced experience. It is thus of paramount importance that network systems support the execution of DNN inference tasks in an efficient and sustainable way. Matching the diverse resources available at the mobile-edge-cloud network tiers with the applications requirements and the complexity of their, while minimizing energy consumption, is however challenging. A possible approach to the problem consists in exploiting the emerging concept of dynamic DNNs, characterized by multi-branched architectures with early exits enabling sample-based adaptation of the model depth. We leverage this concept and address the problem of deploying portions of DNNs with early exits across the mobile-edge-cloud system and allocating therein the necessary network, computing, and memory resources. We do so by developing a 3-stage graph-modeling method that allows us to represent the characteristics of the system and the applications as well as the possible options for splitting the DNN over the multi-tier network nodes. Our solution, called Feasible Inference Graph (FIN), can determine the DNN split, deployment, and resource allocation that minimizes the inference energy consumption while satisfying the nodes' constraints and the requirements of multiple, co-existing applications. FIN closely matches the optimum and leads to over 89% energy savings with respect to state-of-the-art alternatives.

Resource management

Memory management

Computational modeling

Network support to machine learning

Complexity theory

Artificial neural networks

dynamic neural networks

Soft sensors

Servers

Energy consumption

Edge computing

Mobile nodes

inference in the mobile-edge-cloud continuum

energy efficiency

Författare

Chetna Singhal

Université de Rennes

Yashuo Wu

University of California

Francesco Malandrino

CNR - IEIIT

Marco Levorato

University of California

Carla Fabiana Chiasserini

Chalmers, Data- och informationsteknik, Dator- och nätverkssystem

Forskning Andra publikationer

IEEE TRANSACTIONS ON NETWORKING

2998-4157 (eISSN)

Vol. In Press

Ämneskategorier (SSIF 2025)

Programvaruteknik

Kommunikationssystem

Datorsystem

DOI

10.1109/TON.2025.3543848

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2025-09-17

Distributing Inference Tasks Over Interconnected Systems Through Dynamic DNNs Artikel i vetenskaplig tidskrift, 2025