Accordion: A malleable pipeline scheduling approach for adaptive SLO-aware inference serving

Pirah Noor Soomro; Nikela Papadopoulou; Miquel Pericas

doi:10.1145/3719276.3725190

Accordion: A malleable pipeline scheduling approach for adaptive SLO-aware inference serving
Paper in proceeding, 2025

With the rising demand for machine learning-based applications, efficient and cost-effective inference serving systems have become imperative. These systems are tasked with meeting customer requirements outlined by Service Level Objectives (SLOs), encompassing model accuracy, response time, and cost considerations. Despite the adoption of proactive scheduling techniques by modern inference serving systems, dynamic factors such as fluctuating query patterns still pose challenges such as delayed response time.

To address these, we propose an adaptive solution leveraging SLO-aware scheduling techniques to optimize resource allocation. Our approach aims to minimize the need for additional resources per inference service. By introducing malleable inference pipelines, we enhance flexibility in resource allocation during peak loads by readjusting the resource assignment to processing pipelines to accommodate maximum possible queries dynamically.

Our findings indicate that the proposed scheduler effectively utilizes system resources throughout execution while meeting most SLOs (4.2× fewer SLO violations). We observe an average reduction of 1.6× in the end-to-end latency of query processing, compared to baseline methods. We also demonstrate the impact of dynamically reducing the resources per inference query to accommodate more inference queries in the system. Our solution accommodates 1.4× more queries on average compared to the baselines and achieves 1.6× higher system throughput in terms of queries per second on average.

Malleable Task Sched- uling

SLO-Aware scheduling techniques

Parallel Pipelines

Inference Serving System

Author

Pirah Noor Soomro

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Other publications Research

Nikela Papadopoulou

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Other publications Research

Miquel Pericas

Chalmers, Computer Science and Engineering (Chalmers), Computer Engineering (Chalmers)

Other publications Research

Proceedings of the 22nd ACM International Conference on Computing Frontiers

2687‑9247 (ISSN)

Vol. 1 159-167
979-8-4007-1528-0 (ISBN)

22nd ACM International Conference on Computing Frontiers
, Italy,

Subject Categories (SSIF 2025)

Computer Sciences

Computer Systems

DOI

10.1145/3719276.3725190

Publication data connected to DOI

ISBN

9798400715280

More information

Latest update

9/16/2025

Accordion: A malleable pipeline scheduling approach for adaptive SLO-aware inference serving Paper in proceeding, 2025

Author

Pirah Noor Soomro

Nikela Papadopoulou

Miquel Pericas

Proceedings of the 22nd ACM International Conference on Computing Frontiers

Subject Categories (SSIF 2025)

DOI

ISBN

More information

Latest update

Accordion: A malleable pipeline scheduling approach for adaptive SLO-aware inference serving
Paper in proceeding, 2025