Enhancing Hierarchical Reinforcement Learning with Symbolic Planning for Long-Horizon Tasks
Journal article, 2025

Long-horizon tasks, such as box stacking, pose a longstanding challenge in robotic manipulation, especially for Reinforcement Learning (RL). RL typically focuses on learning an optimal policy for completing an entire task, rather than determining the specific sequence of actions required to achieve complex goals. While RL finds a sequence of actions that maximizes the total reward of the task, the main challenge arises when there are infinite possibilities of chaining actions (e.g. reach, push) to achieve the same task (door-opening). In this case, RL struggles to find the optimal policy. In contrast, symbolic planning focuses on determining a sequence of actions to achieve the desired task. This paper introduces a novel framework that integrates symbolic planning operators with hierarchical RL. We propose to change the way complex tasks are trained by learning independent policies for actions defined by high-level operators instead of learning a single policy for the complete long-horizon task. Our approach easily adapts to various tasks by adjusting the learned operator set on demand. We developed a dual-purpose high-level operator, which can be used both in holistic planning and as independent, reusable policies. Our approach offers a flexible solution for long-horizon tasks, e.g., stacking and inserting a cube, and door-opening. Experimental results indicate that our method achieves high success rates of around 95% in policy chaining for comprehensive plan execution and excels in learning independent policies. Furthermore, it remains robust and scalable even with a small sample set evaluation, attaining an 84% success rate for planning and executing a new task (door-opening) and 85% when dealing with a new operator. Experiments in dynamic environments further demonstrate the robustness and adaptability of our approach, which sustains a high success rate of around 90% and outperforms all baselines.

Symbolic Planning

Hierarchical Reinforcement Learning

Long-horizon task

Author

Jing Zhang

Chalmers, Electrical Engineering, Systems and control

Emmanuel Dean

Chalmers, Electrical Engineering, Systems and control

Karinne Ramirez-Amaro

Chalmers, Electrical Engineering, Systems and control

IEEE Transactions on Automation Science and Engineering

1545-5955 (ISSN) 15583783 (eISSN)

Vol. In Press

Subject Categories (SSIF 2025)

Robotics and automation

Computer Sciences

DOI

10.1109/TASE.2025.3641255

More information

Latest update

12/22/2025