Accordion: A malleable pipeline scheduling approach for adaptive SLO-aware inference serving

Pirah Noor Soomro; Nikela Papadopoulou; Miquel Pericas

doi:10.1145/3719276.3725190

Accordion: A malleable pipeline scheduling approach for adaptive SLO-aware inference serving
Paper i proceeding, 2025

With the rising demand for machine learning-based applications, efficient and cost-effective inference serving systems have become imperative. These systems are tasked with meeting customer requirements outlined by Service Level Objectives (SLOs), encompassing model accuracy, response time, and cost considerations. Despite the adoption of proactive scheduling techniques by modern inference serving systems, dynamic factors such as fluctuating query patterns still pose challenges such as delayed response time.

To address these, we propose an adaptive solution leveraging SLO-aware scheduling techniques to optimize resource allocation. Our approach aims to minimize the need for additional resources per inference service. By introducing malleable inference pipelines, we enhance flexibility in resource allocation during peak loads by readjusting the resource assignment to processing pipelines to accommodate maximum possible queries dynamically.

Our findings indicate that the proposed scheduler effectively utilizes system resources throughout execution while meeting most SLOs (4.2× fewer SLO violations). We observe an average reduction of 1.6× in the end-to-end latency of query processing, compared to baseline methods. We also demonstrate the impact of dynamically reducing the resources per inference query to accommodate more inference queries in the system. Our solution accommodates 1.4× more queries on average compared to the baselines and achieves 1.6× higher system throughput in terms of queries per second on average.

Malleable Task Sched- uling

SLO-Aware scheduling techniques

Parallel Pipelines

Inference Serving System

Författare

Pirah Noor Soomro

Chalmers, Data- och informationsteknik, Datorteknik

Forskning Andra publikationer

Nikela Papadopoulou

Chalmers, Data- och informationsteknik, Datorteknik

Forskning Andra publikationer

Miquel Pericas

Chalmers, Data- och informationsteknik, Datorteknik

Forskning Andra publikationer

Proceedings of the 22nd ACM International Conference on Computing Frontiers

2687‑9247 (ISSN)

Vol. 1 159-167
979-8-4007-1528-0 (ISBN)

22nd ACM International Conference on Computing Frontiers
, Italy,

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

Datorsystem

DOI

10.1145/3719276.3725190

Publikationsdata kopplat till DOI

ISBN

9798400715280

Mer information

Senast uppdaterat

2025-09-16

Accordion: A malleable pipeline scheduling approach for adaptive SLO-aware inference serving Paper i proceeding, 2025

Författare

Pirah Noor Soomro

Nikela Papadopoulou

Miquel Pericas

Proceedings of the 22nd ACM International Conference on Computing Frontiers

Ämneskategorier (SSIF 2025)

DOI

ISBN

Mer information

Senast uppdaterat

Accordion: A malleable pipeline scheduling approach for adaptive SLO-aware inference serving
Paper i proceeding, 2025