Linear Bayesian reinforcement learning
Paper i proceeding, 2013
This paper proposes a simple linear Bayesian approach
to reinforcement learning. We show that
with an appropriate basis, a Bayesian linear Gaussian
model is sufficient for accurately estimating
the system dynamics, and in particular when we
allow for correlated noise. Policies are estimated
by first sampling a transition model from the current
posterior, and then performing approximate
dynamic programming on the sampled model. This
form of approximate Thompson sampling results in
good exploration in unknown environments. The
approach can also be seen as a Bayesian generalisation
of least-squares policy iteration, where the
empirical transition matrix is replaced with a sample
from the posterior.