Towards Optimal Algorithms For Online Decision Making Under Practical Constraints
Doktorsavhandling, 2019
More precisely, we start by considering the basic RL problem modeled as a discrete Markov Decision Process. We propose three simple algorithms (UCRL-V, BUCRL and TSUCRL) using two different paradigms: Frequentist (UCRL-V) and Bayesian (BUCRL and TSUCRL). Through a unified theoretical analysis, we show that our three algorithms are near-optimal. Experiments performed confirm the superiority of our methods compared to existing techniques. Afterwards, we address the issue of fairness in the stateless version of reinforcement learning also known as multi-armed bandit. To concentrate our effort on the key challenges, we focus on two-agents multi-armed bandit. We propose a novel objective that has been shown to be connected to fairness and justice. We derive an algorithm UCRG to solve this novel objective and show theoretically its near-optimality. Next, we tackle the issue of privacy by using the recently introduced notion of Differential Privacy. We design multi-armed bandit algorithms that preserve differential-privacy. Theoretical analyses show that for the same level of privacy, our newly developed algorithms achieve better performance than existing techniques.
Reinforcement Learning
Multi-Agent Learning
Differential Privacy
Multi-Armed Bandit
Markov Decision Process
Fairness
Författare
Aristide Tossou
Chalmers, Data- och informationsteknik, Data Science
Achieving Privacy in the Adversarial Multi-Armed Bandit
31st AAAI Conference on Artificial Intelligence, AAAI 2017, San Francisco, United States, 4-10 February 2017,;(2017)p. 2653-2659
Paper i proceeding
Algorithms for Differentially Private Multi-Armed Bandits
30th AAAI Conference on Artificial Intelligence, AAAI 2016, Phoenix Convention CenterPhoenix, United States, 12-17 February 2016,;(2016)p. 2087-2093
Paper i proceeding
A. Tossou, D. Basu, and C. Dimitrakakis. Near-optimal Regret Bounds for Optimistic Reinforcement Learning using Empirical Bernstein Inequalities
A. Tossou, C. Dimitrakakis, and D. Basu. Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process
A. Tossou et al. A Novel Individually Rational Objective In Multi-Agent Multi-Armed Bandit: Algorithms and Regret Bounds
SwissSenseSynergia (SNSF)
Eidgenössische Materialprüfungs- und Forschungsanstalt (Empa) (CRSII2_154458/1153517-295), 2015-01-01 -- 2017-12-31.
Ämneskategorier (SSIF 2011)
Annan data- och informationsvetenskap
Datavetenskap (datalogi)
Datorsystem
Styrkeområden
Informations- och kommunikationsteknik
ISBN
978-91-7905-207-2
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 4674
Utgivare
Chalmers
EA Hörsalsvägen 11, Chalmers
Opponent: Michal Valko