Hierarchical Reinforcement Learning Based on Planning Operators
Paper i proceeding, 2024
Learning long-horizon manipulation tasks such as stacking, presents a longstanding challenge in the field of robotic manipulation, particularly when using Reinforcement Learning (RL) methods. RL algorithms focus on learning a policy for executing the entire task instead of learning the correct sequence of actions required to achieve complex goals. While RL aims to find a sequence of actions that maximises the total reward of the task, the main challenge arises when there are infinite possibilities of chaining these actions (e.g. reach, grasp, etc.) to achieve the same task (stacking). In these cases, RL methods may struggle to find the optimal policy. This paper introduces a novel framework that integrates the operator concepts from the symbolic planning domain with hierarchical RL methods. We propose to change the way complex tasks are trained by learning independent policies of the actions defined by high-level operators instead of learning a policy for the complete complex task. Our contribution integrates planning operators (e.g. preconditions and effects) as part of the hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X) method. We developed a dual-purpose high-level operator, which can be used both in holistic planning and as independent, reusable policies. Our approach offers a flexible solution for long-horizon tasks, e.g., stacking and inserting a cube. The experimental results show that our proposed method achieved an average success rate of 97.2% for learning and executing the whole stack. Furthermore, we obtain a high success rate when learning independent policies, e.g. reach (98.9%), lift (99.7%), move (97.4%), etc. The training time is also reduced by 68% when using our proposed approach.