Sample Efficient Bayesian Reinforcement Learning
Licentiate thesis, 2020
Reinforcement Learning (RL) brings together these two tasks by posing a seemingly benign question “How to act optimally in an unknown environment?”. This requires the agent to learn about its environment as well as plan actions given its current knowledge about it. In RL, the environment can be represented by a mathematical model and we associate an intrinsic value to the actions that the agent may choose.
In this thesis, we present a novel Bayesian algorithm for the problem of RL. Bayesian RL is a widely explored area of research but is constrained by scalability and performance issues. We provide first steps towards rigorous analysis of these types of algorithms. Bayesian algorithms are characterized by the belief that they maintain over their unknowns; which is updated based on the collected evidence. This is different from the traditional approach in RL in terms of problem formulation and formal guarantees. Our novel algorithm combines aspects of planning and learning due to its inherent Bayesian formulation. It does so in a more scalable fashion, with formal PAC guarantees. We also give insights on the application of Bayesian framework for the estimation of model and value, in a joint work on Bayesian backward induction for RL.
Decision Making under Uncertainty
Bayesian Reinforcement Learning
Model based Reinforcement Learning
Author
Divya Grover
Chalmers, Computer Science and Engineering (Chalmers), Data Science
Grover, D., Basu, D., & Dimitrakakis, C. (2019). Bayesian Reinforcement Learning via Deep, Sparse Sampling. arXiv preprint arXiv:1902.02661.
Dimitrakakis, C., Eriksson, H., Jorge, E., Grover, D., & Basu, D. (2020). Inferential Induction: Joint Bayesian Estimation of MDPs and Value Functions. arXiv preprint arXiv:2002.03098.
Areas of Advance
Information and Communication Technology
Subject Categories (SSIF 2011)
Robotics
Probability Theory and Statistics
Computer Science
Publisher
Chalmers
EDIT 8103, Rännvägen 6B
Opponent: Frans A. Oliehoek, Delft University of Technology, Netherlands