Shared Resources in Distributed Systems: Analytical Tools for Evaluation and Self-stabilizing Provisioning
Doctoral thesis, 2018

Distributed computing is an established computing paradigm of modern computing systems.The nodes of a distributed system interact either by sharing resources or via a communication network. In both cases, provisioning of shared resources is a challenge, for example when resource demand and supply varies or when the system is prone to failures. Analytical tools for evaluating system performance and for provisioning shared resources enhance system design and implementations. In this thesis, we develop analytical tools for the evaluation and self-stabilizing provisioning of shared-resources in distributed systems. We first focus on systems where resource demand and supply varies, and study cases of reusable and non-reusable resources. We study shared-object systems, where system nodes demand mutually exclusive access to a number of objects in a continuous fashion. We develop analytical tools for computing the expected delay and throughput of such systems, in a wide range of system utilization scenarios, including saturation points. Moreover, we study systems where nodes share energy resources, and focus on optimizing the available resources on a system-level. We develop online algorithms that use the flexibility on resource demand, to optimize the utilization of the available supply, and prove their competitive ratios. Recovery from failures is necessary for provisioning shared resources. Dynamic and complex systems are often designed based on a failure model, but it is important that they recover even after the occurrence of unexpected failures, outside the failure model. Such failures can include topological changes in the network, stale information in the nodes' memory, communication failures, etc. These failures are further amplified by the system's asynchrony. In these settings, we first focus on provisioning of network resources, in terms of network control and ordering of distributed events. We study Software-Defined Networks (SDNs) and specifically their control planes. We provide a self-stabilizing distributed algorithm for a fault-tolerant SDN control plane, that deals with communication failures, topological changes, as well as, with transient faults, that can bring the system in an arbitrary state. Moreover, we focus on ordering distributed events in asynchronous message-passing systems, in the absence of execution fairness. In these extreme asynchronous settings, we provide a practically-self-stabilizing distributed algorithm, that uses bounded memory and yet, can tolerate concurrent counter overflows, when counting distributed events, as well as transient faults.

self-stabilization

smart grid

online algorithms

shared object systems

resource sharing

software-defined networks

distributed algorithms

SB-H5, Sven Hultins Gata 6, Chalmers
Opponent: Prof. Dr. Christian Scheideler, Paderborn University, Paderborn, Germany

Author

Iosif Salem

Chalmers, Computer Science and Engineering (Chalmers), Networks and Systems (Chalmers)

Shared-object system equilibria: Delay and throughput analysis

17th International Conference on Distributed Computing and Networking, ICDCN 2016; Singapore; Singapore; 4 January 2016 through 7 January 2016,; (2016)p. Art. no. a30-

Paper in proceeding

Tailor your curves after your costume: Supply-following demand in smart grids through the Adwords problem

Proceedings of the ACM Symposium on Applied Computing,; Vol. 04-08-April-2016(2016)p. 2127-2134

Paper in proceeding

A Self-Organizing Distributed and In-Band SDN Control Plane

37th IEEE International Conference on Distributed Computing Systems, ICDCS 2017, Atlanta, United States, 5-8 June 2017,; (2017)p. 2656-2657

Paper in proceeding

Iosif Salem, Elad M. Schiller. Practically-Self-Stabilizing Vector Clocks in the Absence of Execution Fairness

Distributed computing is an established computing paradigm of modern computing systems.The nodes of a distributed system interact either by sharing resources or via a communication network. In both cases, provisioning of shared resources is a challenge, for example when resource demand and supply varies or when the system is prone to failures. Analytical tools for evaluating system performance and for provisioning shared resources enhance system design and implementations.

In this thesis, we develop analytical tools for the evaluation and self-stabilizing provisioning of shared-resources in distributed systems. We first focus on systems where resource demand and supply varies, and study cases of reusable and non-reusable resources. We study shared-object systems, where system nodes demand mutually exclusive access to a number of objects in a continuous fashion. We develop analytical tools for computing the expected delay and throughput of such systems, in a wide range of system utilization scenarios, including saturation points. Moreover, we study systems where nodes share energy resources, and focus on optimizing the available resources on a system-level. We develop online algorithms that use the flexibility on resource demand, to optimize the utilization of the available supply, and prove their competitive ratios.

Recovery from failures is necessary for provisioning shared resources. Dynamic and complex systems are often designed based on a failure model, but it is important that they recover even after the occurrence of unexpected failures, outside the failure model. Such failures can include topological changes in the network, stale information in the nodes' memory, communication failures, etc. These failures are further amplified by the system's asynchrony. In these settings, we first focus on provisioning of network resources, in terms of network control and ordering of distributed events. We study Software-Defined Networks (SDNs) and specifically their control planes. We provide a self-stabilizing distributed algorithm for a fault-tolerant SDN control plane, that deals with communication failures, topological changes, as well as, with transient faults, that can bring the system in an arbitrary state. Moreover, we focus on ordering distributed events in asynchronous message-passing systems, in the absence of execution fairness. In these extreme asynchronous settings, we provide a practically-self-stabilizing distributed algorithm, that uses bounded memory and yet, can tolerate concurrent counter overflows, when counting distributed events, as well as transient faults.

Subject Categories

Computer Engineering

Computer Science

Computer Systems

Areas of Advance

Information and Communication Technology

Energy

ISBN

978-91-7597-682-2

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4363

Publisher

Chalmers

SB-H5, Sven Hultins Gata 6, Chalmers

Opponent: Prof. Dr. Christian Scheideler, Paderborn University, Paderborn, Germany

More information

Created

1/2/2018 1