MoVEMo - A structured approach for engineering reward functions

Piergiuseppe Mallozzi; Raul Pardo Jimenez; Vincent Duplessis; Patrizio Pelliccione; Gerardo Schneider

doi:10.1109/IRC.2018.00053

MoVEMo - A structured approach for engineering reward functions
Paper i proceeding, 2018

Reinforcement learning (RL) is a machine learning technique that has been increasingly used in robotic systems. In reinforcement learning, instead of manually pre-program what action to take at each step, we convey the goal of a software agent in terms of reward functions. The agent tries diﬀerent actions in order to maximize a numerical value, i.e. the reward. A misspeciﬁed reward function can cause problems such as reward hacking, where the agent ﬁnds out ways that maximize the reward without achieving the intended goal.

As RL agents become more general and autonomous, the design of reward functions that elicit the desired behaviour in the agent becomes more important and cumbersome. In this paper, we present a technique to formally express reward functions in a structured way; this stimulates a proper reward function design and as well enables the formal veriﬁcation of it. We start by deﬁning the reward function using state machines. In this way, we can statically check that the reward function satisﬁes certain properties, e.g., high-level requirements of the function to learn. Later we automatically generate a runtime monitor which runs in parallel with the learning agent-that provides the rewards according to the deﬁnition of the state machine and based on the behaviour of the agent.

We use the Uppaal model checker to design the reward model and verify the TCTL properties that model high-level requirements of the reward function and Larva to monitor and enforce the reward model to the RL agent at runtime.

reward function

robotics

runtime monitoring

reinforcement learning,

Författare

Piergiuseppe Mallozzi

Chalmers, Data- och informationsteknik, Software Engineering

Forskning Andra publikationer

Raul Pardo Jimenez

Chalmers, Data- och informationsteknik, Formella metoder

Forskning Andra publikationer

Vincent Duplessis

ENSICAEN Ecole Nationale Superieure d'Ingenieurs de Caen

Patrizio Pelliccione

Chalmers, Data- och informationsteknik, Software Engineering

Forskning Andra publikationer

Gerardo Schneider

Chalmers, Data- och informationsteknik, Formella metoder

Forskning Andra publikationer

Proceedings - 2nd IEEE International Conference on Robotic Computing, IRC 2018

Vol. 2018-January 250-257
978-1-5386-4651-9 (ISBN)

2018 Second IEEE International Conference on Robotic Computing (IRC)
Laguna Hills, CA, USA,

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier (SSIF 2011)

Robotteknik och automation

Datavetenskap (datalogi)

Datorsystem

DOI

10.1109/IRC.2018.00053

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2024-07-22

MoVEMo - A structured approach for engineering reward functions Paper i proceeding, 2018

Författare

Piergiuseppe Mallozzi

Raul Pardo Jimenez

Vincent Duplessis

Patrizio Pelliccione

Gerardo Schneider

Proceedings - 2nd IEEE International Conference on Robotic Computing, IRC 2018

Styrkeområden

Ämneskategorier (SSIF 2011)

DOI

Mer information

Senast uppdaterat

MoVEMo - A structured approach for engineering reward functions
Paper i proceeding, 2018