Deep Q-learning: a robust control approach

Balázs Varga; Balázs Adam Kulcsár; Morteza Haghir Chehreghani

doi:10.1002/rnc.6457

Deep Q-learning: a robust control approach
Artikel i vetenskaplig tidskrift, 2023

This work aims at constructing a bridge between robust control theory and reinforcement learning. Although, reinforcement learning has shown admirable results in complex control tasks, the agent’s learning behaviour is opaque. Meanwhile, system theory has several tools for analyzing and controlling dynamical systems. This paper places deep Q-learning is into a control-oriented perspective to study its learning dynamics with well-established techniques from robust control. An uncertain linear time-invariant model is formulated by means of the neural tangent kernel to describe learning. This novel approach allows giving conditions for stability (convergence) of the learning and enables the analysis of the agent’s behaviour in frequency-domain. The control-oriented approach makes it possible to formulate robust controllers that inject dynamical rewards as control input in the loss function to achieve better convergence properties. Three output-feedback controllers are synthesized: gain scheduling H2, dynamical Hinf, and fixed-structure Hinf controllers. Compared to traditional deep Q-learning techniques, which involve several heuristics, setting up the learning agent with a control-oriented tuning methodology is more transparent and has well-established literature. The proposed approach does not use a target network and randomized replay memory. The role of the target network is overtaken by the control input, which also exploits the temporal dependency of samples (opposed to a randomized memory buffer). Numerical simulations in different OpenAI Gym environments suggest that the Hinf controlled learning can converge faster and receive higher scores (depending on the environment) compared to the benchmark Double deep Q-learning.

Deep Q-learning

Robust control

Neural Tangent Kernel

Controlled learning

Författare

Balázs Varga

Chalmers, Elektroteknik, System- och reglerteknik

Forskning Andra publikationer

Balázs Adam Kulcsár

Chalmers, Elektroteknik, System- och reglerteknik

Forskning Andra publikationer

Morteza Haghir Chehreghani

Chalmers, Data- och informationsteknik, Data Science och AI

Forskning Andra publikationer

International Journal of Robust and Nonlinear Control

1049-8923 (ISSN) 1099-1239 (eISSN)

Vol. 33 1 526-544

Real-Time Robust and AdaptIve Learning in ElecTric VEhicles (RITE)

Chalmers AI-forskningscentrum (CHAIR), 2020-01-01 -- 2021-12-31.

Chalmers, 2020-01-01 -- 2021-12-31.

Visa projekt

Robustly and Optimally Controlled Training Of neural Networks II (OCTON II)

Centiro, 2020-05-01 -- 2025-04-30.

Visa projekt

Robustly and Optimally Controlled Training Of neural Networks I (OCTON I)

Centiro, 2019-10-15 -- 2023-10-15.

Visa projekt

Ämneskategorier (SSIF 2011)

Beräkningsmatematik

Robotteknik och automation

Reglerteknik

Signalbehandling

DOI

10.1002/rnc.6457

Publikationsdata kopplat till DOI

Mer information

Senast uppdaterat

2023-04-24

Deep Q-learning: a robust control approach Artikel i vetenskaplig tidskrift, 2023

Författare

Balázs Varga

Balázs Adam Kulcsár

Morteza Haghir Chehreghani

International Journal of Robust and Nonlinear Control

Real-Time Robust and AdaptIve Learning in ElecTric VEhicles (RITE)

Robustly and Optimally Controlled Training Of neural Networks II (OCTON II)

Robustly and Optimally Controlled Training Of neural Networks I (OCTON I)

Ämneskategorier (SSIF 2011)

DOI

Mer information

Senast uppdaterat

Deep Q-learning: a robust control approach
Artikel i vetenskaplig tidskrift, 2023