Towards Optimal Algorithms For Online Decision Making Under Practical Constraints
Doctoral thesis, 2019
More precisely, we start by considering the basic RL problem modeled as a discrete Markov Decision Process. We propose three simple algorithms (UCRL-V, BUCRL and TSUCRL) using two different paradigms: Frequentist (UCRL-V) and Bayesian (BUCRL and TSUCRL). Through a unified theoretical analysis, we show that our three algorithms are near-optimal. Experiments performed confirm the superiority of our methods compared to existing techniques. Afterwards, we address the issue of fairness in the stateless version of reinforcement learning also known as multi-armed bandit. To concentrate our effort on the key challenges, we focus on two-agents multi-armed bandit. We propose a novel objective that has been shown to be connected to fairness and justice. We derive an algorithm UCRG to solve this novel objective and show theoretically its near-optimality. Next, we tackle the issue of privacy by using the recently introduced notion of Differential Privacy. We design multi-armed bandit algorithms that preserve differential-privacy. Theoretical analyses show that for the same level of privacy, our newly developed algorithms achieve better performance than existing techniques.
Reinforcement Learning
Multi-Agent Learning
Differential Privacy
Multi-Armed Bandit
Markov Decision Process
Fairness
Author
Aristide Tossou
Chalmers, Computer Science and Engineering (Chalmers), Data Science
Achieving Privacy in the Adversarial Multi-Armed Bandit
31st AAAI Conference on Artificial Intelligence, AAAI 2017, San Francisco, United States, 4-10 February 2017,;(2017)p. 2653-2659
Paper in proceeding
Algorithms for Differentially Private Multi-Armed Bandits
30th AAAI Conference on Artificial Intelligence, AAAI 2016, Phoenix Convention CenterPhoenix, United States, 12-17 February 2016,;(2016)p. 2087-2093
Paper in proceeding
A. Tossou, D. Basu, and C. Dimitrakakis. Near-optimal Regret Bounds for Optimistic Reinforcement Learning using Empirical Bernstein Inequalities
A. Tossou, C. Dimitrakakis, and D. Basu. Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process
A. Tossou et al. A Novel Individually Rational Objective In Multi-Agent Multi-Armed Bandit: Algorithms and Regret Bounds
SwissSenseSynergia (SNSF)
Swiss Federal Laboratories for Materials Science and Technology (Empa) (CRSII2_154458/1153517-295), 2015-01-01 -- 2017-12-31.
Subject Categories
Other Computer and Information Science
Computer Science
Computer Systems
Areas of Advance
Information and Communication Technology
ISBN
978-91-7905-207-2
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4674
Publisher
Chalmers
EA Hörsalsvägen 11, Chalmers
Opponent: Michal Valko