Activation sparsity and dynamic pruning for split computing in edge AI
Paper in proceeding, 2022

Deep neural networks are getting larger and, therefore, harder to deploy on constrained IoT devices. Split computing provides a solution by splitting a network and placing the first few layers on the IoT device. The output of these layers is transmitted to the cloud where inference continues. Earlier works indicate a degree of high sparsity in intermediate activation outputs, this paper analyzes and exploits activation sparsity to reduce the network communication overhead when transmitting intermediate data to the cloud. Specifically, we analyze the intermediate activations of two early layers in ResNet-50 on CIFAR-10 and ImageNet, focusing on sparsity to guide the process of choosing a splitting point. We employ dynamic pruning of activations and feature maps and find that sparsity is very dependent on the size of a layer, and weights do not correlate with activation sparsity in convolutional layers. Additionally, we show that sparse intermediate outputs can be compressed by a factor of 3.3X at an accuracy loss of 1.1% without any fine-tuning. When adding fine-tuning, the compression factor increases up to 14X at a total accuracy loss of 1%.

Edge Computing

Offloading

Feature Map Pruning

Activation Sparsity

Deep Learning

Author

Janek Haberer

University of Kiel

Olaf Landsiedel

University of Kiel

Network and Systems

DistributedML 2022 - Proceedings of the 3rd International Workshop on Distributed Machine Learning, Part of CoNEXT 2022

30-36
9781450399227 (ISBN)

3rd International Workshop on Distributed Machine Learning, DistributedML 2022, co-located with the 18th International Conference on emerging Networking EXperiments and Technologies, CoNEXT 2022
Rome, Italy,

Subject Categories

Computer Engineering

Telecommunications

Communication Systems

Areas of Advance

Information and Communication Technology

DOI

10.1145/3565010.3569066

More information

Latest update

10/26/2023