TailClipper: Reducing Tail Response Time of Distributed Services Through System-Wide Scheduling
Paper i proceeding, 2024

Reducing tail latency has become a crucial issue for optimizing the performance of online cloud services and distributed applications. In distributed applications, there are many causes of high end-to-end tail latency, including operating system delays, request re-ordering due to fan-out/fanin, and network congestion. Although recent research has focused on reducing tail latency for individual application components, such as by replicating requests and scheduling, in this paper, we argue for a holistic approach for reducing the end-to-end tail latency across application components. We propose TailClipper, a distributed scheduler that tags each arriving request with an arrival timestamp, and propagates it across the microservices' call chain. TailClipper then uses arrival timestamps to implement an oldest request first scheduler that combines global first-come first serve with a limited form of processor sharing to reduce end-to-end tail latency. In doing so, TailClipper can counter the performance degradation caused by request reordering in multi-tiered and microservices-based applications. We implement TailClipper as a userspace Linux scheduler and evaluate it using cloud workload traces and a real-world microservices application. Compared to state-of-the-art schedulers, our experiments reveal that TailClipper improves the 99th percentile response time by up to 81%, while also improving the mean response time and the system throughput by up to 54% and 29% respectively under high loads.

scheduling

Cloud computing

tail latency reduction

Författare

Nathan Ng

University of Massachusetts

Abel Souza

University of California

Ahmed Ali-Eldin Hassan

Nätverk och System

David Irwin

University of Massachusetts

Don Towsley

University of Massachusetts

Prashant Shenoy

University of Massachusetts

SoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing

398-414
9798400712869 (ISBN)

15th Annual ACM Symposium on Cloud Computing, SoCC 2024
Redmond, USA,

Ämneskategorier (SSIF 2025)

Datorteknik

Telekommunikation

Datorsystem

DOI

10.1145/3698038.3698554

Mer information

Senast uppdaterat

2025-01-28