Minimax-Bayes Reinforcement Learning
Conference poster, 2023
solution to the problem of decision making under uncertainty, one
question is how to appropriately select the prior distribution. One
idea is to employ a worst-case prior. However, this is not as easy to
specify in sequential decision making as in simple statistical
estimation problems. This paper studies (sometimes approximate)
minimax-Bayes solutions for various reinforcement learning problems
to gain insights into the properties of the corresponding priors and
policies. We find that while the worst-case prior depends on the
setting, the corresponding minimax policies are more robust than
those that assume a standard (i.e. uniform) prior.
Minimax
reinforcement learning
Markov decision processes
Author
Thomas Kleine Buening
University of Oslo
Christos Dimitrakakis
University of Neuchatel
University of Oslo
Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI
Hannes Eriksson
University of Gothenburg
Zenseact AB
Divya Grover
Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI
Emilio Jorge
Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI
Valencia, Spain,
Subject Categories
Computational Mathematics
Computer Science
Computer Vision and Robotics (Autonomous Systems)