Sample Efficient Bayesian Reinforcement Learning

Divya Grover

Sample Efficient Bayesian Reinforcement Learning
Licentiatavhandling, 2020

Artificial Intelligence (AI) has been an active field of research for over a century now. The research field of AI may be grouped into various tasks that are expected from an intelligent agent; two major ones being learning & inference and planning. The act of storing new knowledge is known as learning while inference refers to the act to extracting conclusions given agent’s limited knowledge base. They are tightly knit by the design of its knowledge base. The process of deciding long-term actions or plans given its current knowledge is called planning.

Reinforcement Learning (RL) brings together these two tasks by posing a seemingly benign question “How to act optimally in an unknown environment?”. This requires the agent to learn about its environment as well as plan actions given its current knowledge about it. In RL, the environment can be represented by a mathematical model and we associate an intrinsic value to the actions that the agent may choose.

In this thesis, we present a novel Bayesian algorithm for the problem of RL. Bayesian RL is a widely explored area of research but is constrained by scalability and performance issues. We provide first steps towards rigorous analysis of these types of algorithms. Bayesian algorithms are characterized by the belief that they maintain over their unknowns; which is updated based on the collected evidence. This is different from the traditional approach in RL in terms of problem formulation and formal guarantees. Our novel algorithm combines aspects of planning and learning due to its inherent Bayesian formulation. It does so in a more scalable fashion, with formal PAC guarantees. We also give insights on the application of Bayesian framework for the estimation of model and value, in a joint work on Bayesian backward induction for RL.

Decision Making under Uncertainty

Bayesian Reinforcement Learning

Model based Reinforcement Learning

EDIT 8103, Rännvägen 6B

Opponent: Frans A. Oliehoek, Delft University of Technology, Netherlands

Online disputation

Författare

Divya Grover

Chalmers, Data- och informationsteknik, Data Science

Forskning Andra publikationer

Grover, D., Basu, D., & Dimitrakakis, C. (2019). Bayesian Reinforcement Learning via Deep, Sparse Sampling. arXiv preprint arXiv:1902.02661.

Dimitrakakis, C., Eriksson, H., Jorge, E., Grover, D., & Basu, D. (2020). Inferential Induction: Joint Bayesian Estimation of MDPs and Value Functions. arXiv preprint arXiv:2002.03098.

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier (SSIF 2011)

Robotteknik och automation

Sannolikhetsteori och statistik

Datavetenskap (datalogi)

Utgivare

Chalmers