MoVEMo - A structured approach for engineering reward functions
Paper i proceeding, 2018
As RL agents become more general and autonomous, the design of reward functions that elicit the desired behaviour in the agent becomes more important and cumbersome. In this paper, we present a technique to formally express reward functions in a structured way; this stimulates a proper reward function design and as well enables the formal verification of it. We start by defining the reward function using state machines. In this way, we can statically check that the reward function satisfies certain properties, e.g., high-level requirements of the function to learn. Later we automatically generate a runtime monitor which runs in parallel with the learning agent-that provides the rewards according to the definition of the state machine and based on the behaviour of the agent.
We use the Uppaal model checker to design the reward model and verify the TCTL properties that model high-level requirements of the reward function and Larva to monitor and enforce the reward model to the RL agent at runtime.
reward function
robotics
runtime monitoring
reinforcement learning,
Författare
Piergiuseppe Mallozzi
Chalmers, Data- och informationsteknik, Software Engineering
Raul Pardo Jimenez
Chalmers, Data- och informationsteknik, Formella metoder
Vincent Duplessis
ENSICAEN Ecole Nationale Superieure d'Ingenieurs de Caen
Patrizio Pelliccione
Chalmers, Data- och informationsteknik, Software Engineering
Gerardo Schneider
Chalmers, Data- och informationsteknik, Formella metoder
Proceedings - 2nd IEEE International Conference on Robotic Computing, IRC 2018
Vol. 2018-January 250-257
978-1-5386-4651-9 (ISBN)
Laguna Hills, CA, USA,
Styrkeområden
Informations- och kommunikationsteknik
Ämneskategorier (SSIF 2011)
Robotteknik och automation
Datavetenskap (datalogi)
Datorsystem
DOI
10.1109/IRC.2018.00053