Leveraging Symbolic Models in Reinforcement Learning for Multi-skill Chaining
Paper i proceeding, 2025
We envision robots learning new skills as efficiently as possible. A key challenge in this pursuit is that the efficiency of learning systems such as Reinforcement Learning (RL) deteriorates with the task complexity. For instance, in the task of building a tower of cubes, multiple subtasks must be executed in the correct sequence to succeed with this long-horizon task. As the sequence extends, determining the correct order of actions becomes increasingly difficult, particularly for RL methods that rely on trial-and-error. To tackle this, we propose a new method that integrates symbolic models into RL to boost learning efficiency in long-horizon tasks. Symbolic models offer task abstraction and can enhance sample efficiency for RL agents through high-level operators. Our approach focuses on task decomposition aligned with the structure of Automated Planning (AP) operators, enabling RL agents to synthesize individual skills for specific subtasks, thus they will require fewer learning samples. The task decomposition is designed with dual consideration: 1) reducing errors when linking subsequent skills together and 2) enhancing skill reusability for downstream tasks with similar structures. In simulated robot manipulation tasks, such as stacking two cubes, experiments demonstrate superior sample efficiency for our proposed approach (a 2x reduction in training cost) compared to most RL baselines. Furthermore, our method is robust to generalise to unseen rearrangement tasks with minimal interaction steps (fewer than 100), achieving an average success rate approximately 50% higher than baselines, which often struggle to make progress.