Efficient Bayesian Planning
Doctoral thesis, 2022
In this thesis, we present novel Bayesian planning algorithms. First, we propose DSS (Deeper, Sparser Sampling) for the case of unknown environment dynamics. It is a meta-algorithm derived from a simple insight about the Bayes rule, which beats the state-of-the-art across the board from discrete to continuous state settings. A theoretical analysis provides a high probability bound on its performance. Our analysis is different from previous approaches in the literature in terms of problem formulation and formal guarantees. The result also contrasts with those of previous comparable BRL algorithms, which typically provide asymptotic convergence guarantees. Suitable Bayesian models and their corresponding planners are proposed for implementing the discrete and continuous versions of DSS. We then address the issue of partial observability via our second algorithm, FMP (Finite Memory Planner). This uses depth-dependent partitioning of the infinite planning tree. Experimental results demonstrate comparable performance to the current state-of-the-art for both discrete and continuous settings. Finally, we propose algorithms for finding the best policy for the worst case belief in the Minimax Bayesian setting.
POMDP
Partially Observable MDP
Planning
Bayesian Reinforcement Learning
Author
Divya Grover
Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI
Bayesian Reinforcement Learning via Deep, Sparse Sampling
Proceedings of Machine Learning Research,;Vol. 108(2020)p. 3036-3045
Paper in proceeding
Divya Grover, Debabrota Basu, Christos Dimitrakakis. “Bayesian Reinforcement Learning via Approximate Planning in Bayes Adaptive MDP” submitted to the Journal of Artificial Intelligence Research, 2022.
Divya Grover, Christos Dimitrakakis. “Adaptive Belief Discretization for POMDP Planning” in 15th European Workshop on Reinforcement Learning. Milano, Italy, September 19-21, 2022.
Minimax-Bayes Reinforcement Learning
Proceedings of Machine Learning Research,;Vol. 206(2023)p. 7511-7527
Paper in proceeding
Our solution particularly aims at maximizing performance given finite computation resources. We focus on identifying key properties in both settings and developing algorithms around them. One such property is the rate of change of the Bayesian belief; over possible dynamics model for the RL and over the state space for the POMDP setting. In our algorithms, we propose tunable parameters that are strongly correlated to these key properties, thereby granting direct control on the computation-performance tradeoff. The proposed solutions beat the current best solutions and elegantly scale to larger problems. A key requirement of our algorithms is the existence of some framework for Bayesian Inference, which is not always true due to modelling issues. This work can be extended to such problem settings as well.
Learning, privacy and the limits of computation
Swedish Research Council (VR) (2015-05410), 2016-01-01 -- 2019-12-31.
Subject Categories
Computer and Information Science
Probability Theory and Statistics
ISBN
978-91-7905-758-9
Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5224
Publisher
Chalmers
Hall HC2, Hörsalsvägen 14
Opponent: Alessandro Lazaric, Research Scientist, Facebook Artificial Intelligence Research