Probabilistic inverse reinforcement learning in unknown environments
Paper i proceeding, 2013

We consider the problem of learning by demonstration from agents acting in un- known stochastic Markov environments or games. Our aim is to estimate agent prefer- ences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous prob- abilistic approaches for inverse reinforcement learning in known MDPs to the case of un- known dynamics or opponents. We do this by deriving two simplified probabilistic mod- els of the demonstrator's policy and utility. For tractability, we use maximum a posteri- ori estimation rather than full Bayesian in- ference. Under a at prior, this results in a convex optimisation problem. We nd that the resulting algorithms are highly compet- itive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.


Aristide Tossou

Christos Dimitrakakis

Chalmers, Data- och informationsteknik, Datavetenskap

Conference on Uncertainty in Artificial Intelligence, UAI 2013


Informations- och kommunikationsteknik


Människa-datorinteraktion (interaktionsdesign)

Sannolikhetsteori och statistik