Towards Optimal Algorithms For Online Decision Making Under Practical Constraints
Doktorsavhandling, 2019

Artificial Intelligence is increasingly being used in real-life applications such as driving with autonomous cars; deliveries with autonomous drones; customer support with chat-bots; personal assistant with smart speakers . . . An Artificial Intelligent agent (AI) can be trained to become expert at a task through a system of rewards and punishment, also well known as Reinforcement Learning (RL). However, since the AI will deal with human beings, it also has to follow some moral rules to accomplish any task. For example, the AI should be fair to the other agents and not destroy the environment. Moreover, the AI should not leak the privacy of users’ data it processes. Those rules represent significant challenges in designing AI that we tackle in this thesis through mathematically rigorous solutions.

More precisely, we start by considering the basic RL problem modeled as a discrete Markov Decision Process. We propose three simple algorithms (UCRL-V, BUCRL and TSUCRL) using two different paradigms: Frequentist (UCRL-V) and Bayesian (BUCRL and TSUCRL). Through a unified theoretical analysis, we show that our three algorithms are near-optimal. Experiments performed confirm the superiority of our methods compared to existing techniques. Afterwards, we address the issue of fairness in the stateless version of reinforcement learning also known as multi-armed bandit. To concentrate our effort on the key challenges, we focus on two-agents multi-armed bandit. We propose a novel objective that has been shown to be connected to fairness and justice. We derive an algorithm UCRG to solve this novel objective and show theoretically its near-optimality. Next, we tackle the issue of privacy by using the recently introduced notion of Differential Privacy. We design multi-armed bandit algorithms that preserve differential-privacy. Theoretical analyses show that for the same level of privacy, our newly developed algorithms achieve better performance than existing techniques.

Reinforcement Learning

Multi-Agent Learning

Differential Privacy

Multi-Armed Bandit

Markov Decision Process

Fairness

EA Hörsalsvägen 11, Chalmers
Opponent: Michal Valko

Författare

Aristide Tossou

Chalmers, Data- och informationsteknik, Data Science

Achieving Privacy in the Adversarial Multi-Armed Bandit

31st AAAI Conference on Artificial Intelligence, AAAI 2017, San Francisco, United States, 4-10 February 2017,; (2017)p. 2653-2659

Paper i proceeding

Algorithms for Differentially Private Multi-Armed Bandits

30th AAAI Conference on Artificial Intelligence, AAAI 2016, Phoenix Convention CenterPhoenix, United States, 12-17 February 2016,; (2016)p. 2087-2093

Paper i proceeding

A. Tossou, D. Basu, and C. Dimitrakakis. Near-optimal Regret Bounds for Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

A. Tossou, C. Dimitrakakis, and D. Basu. Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

A. Tossou et al. A Novel Individually Rational Objective In Multi-Agent Multi-Armed Bandit: Algorithms and Regret Bounds

SwissSenseSynergia (SNSF)

Eidgenössische Materialprüfungs- und Forschungsanstalt (Empa), 2015-01-01 -- 2017-12-31.

Ämneskategorier

Annan data- och informationsvetenskap

Datavetenskap (datalogi)

Datorsystem

Styrkeområden

Informations- och kommunikationsteknik

ISBN

978-91-7905-207-2

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4674

Utgivare

Chalmers tekniska högskola

EA Hörsalsvägen 11, Chalmers

Opponent: Michal Valko

Mer information

Senast uppdaterat

2019-11-01