Adaptive scheduling of inference pipelines on multicore architectures
Doctoral thesis, 2025
Heterogeneous edge devices present unique challenges due to resource constraints and architectural diversity, while chiplet-based architectures offer potential enhancements in inference performance. Leveraging innovative techniques such as online tuning algorithms, malleable and moldable inference pipelines, and adaptive scheduling strategies, our thesis proposes a comprehensive framework for optimizing DNN inference. This framework aims to advance system performance, reduce latency, and mitigate interference effects, thereby contributing to the development of more efficient and scalable AI systems capable of meeting the evolving demands of real-time inference across diverse computational platforms.
The thesis addresses several key problem statements, including enabling runtime scheduling of inference pipelines on edge devices, fully online scheduling of inference pipelines on heterogeneous platforms, mitigating interference effects on inference pipelines in inference-serving systems, and optimizing resource allocation in inference-serving systems for adaptive SLO-aware inference serving.
The contributions of this thesis are encapsulated in four papers, each focusing on distinct aspects of CNN inference optimization. These contributions include the development of comprehensive frameworks for online scheduling of CNN pipelines, leveraging platform knowledge for expedited seed generation, dynamic scheduling techniques to alleviate interference effects, and SLO-aware scheduling techniques for optimizing resource allocation in inference-serving systems. Through these contributions, this thesis seeks to advance the state-of-the-art in CNN inference optimization and inference-serving systems, paving the way for more efficient and scalable AI systems capable of meeting the demands of real-time inference across diverse computational platforms.
Online tuning
CNN parallel pipelines
Design space exploration
Interference Mitigation
Heterogeneous computing units
Processing on chiplets
Inference Serving Systems
Author
Pirah Noor Soomro
Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)
Accordion: A malleable pipeline scheduling approach for adaptive SLO-aware inference serving
Proceedings of the 22nd ACM International Conference on Computing Frontiers,;(2025)p. 159-167
Paper in proceeding
ODIN: Overcoming Dynamic Interference in iNference Pipelines
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 14100 LNCS(2023)p. 169-183
Paper in proceeding
Shisha: Online Scheduling of CNN Pipelines on Heterogeneous Architectures
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),;Vol. 13826 LNCS(2023)p. 249-262
Paper in proceeding
An online guided tuning approach to run CNN pipelines on edge devices
Proceedings of the 18th ACM International Conference on Computing Frontiers 2021, CF 2021,;(2021)p. 45-53
Paper in proceeding
Subject Categories (SSIF 2025)
Computer Sciences
Computer Systems
ISBN
978-91-8103-261-1
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5719
Publisher
Chalmers