Probabilistic inverse reinforcement learning in unknown environments
Paper in proceedings, 2013

We consider the problem of learning by demonstration from agents acting in un- known stochastic Markov environments or games. Our aim is to estimate agent prefer- ences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous prob- abilistic approaches for inverse reinforcement learning in known MDPs to the case of un- known dynamics or opponents. We do this by deriving two simplified probabilistic mod- els of the demonstrator's policy and utility. For tractability, we use maximum a posteri- ori estimation rather than full Bayesian in- ference. Under a at prior, this results in a convex optimisation problem. We nd that the resulting algorithms are highly compet- itive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.


Aristide Tossou

Christos Dimitrakakis

Chalmers, Computer Science and Engineering (Chalmers), Computing Science (Chalmers)

Conference on Uncertainty in Artificial Intelligence, UAI 2013

Areas of Advance

Information and Communication Technology

Subject Categories

Human Computer Interaction

Probability Theory and Statistics

More information