RL Perceptron: Generalization Dynamics of Policy Learning in High Dimensions
Artikel i vetenskaplig tidskrift, 2025

Reinforcement learning (RL) algorithms have transformed many domains of machine learning. To tackle real-world problems, RL often relies on neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, many theories of RL have focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional RL model that can capture a variety of learning protocols, and we derive its typical policy learning dynamics as a set of closed-form ordinary differential equations. We obtain optimal schedules for the learning rates and task difficulty - analogous to annealing schemes and curricula during training in RL - and show that the model exhibits rich behavior, including delayed learning under sparse rewards, a variety of learning regimes depending on reward baselines, and a speed-accuracy trade-off driven by reward stringency. Experiments on variants of the Procgen game "Bossfight"and Arcade Learning Environment game "Pong"also show such a speed-accuracy trade-off in practice. Together, these results take a step toward closing the gap between theory and practice in high-dimensional RL.

Författare

Nishil Patel

University College London (UCL)

Sebastian Lee

University College London (UCL)

Imperial College London

Stefano Sarao Mannelli

Data Science och AI 3

University of Witwatersrand

Sebastian Goldt

Scuola Internazionale Superiore di Studi Avanzati

Andrew Saxe

University College London (UCL)

Physical Review X

21603308 (eISSN)

Vol. 15 2 021051

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

DOI

10.1103/PhysRevX.15.021051

Relaterade dataset

RL_Perceptron [dataset]

URI: https://github.com/nishp99/RL_Perceptron

Mer information

Senast uppdaterat

2025-05-23