Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
Paper in proceeding, 2025

Artificial neural networks often struggle with catastrophic forgetting when learning multiple tasks sequentially, as training on new tasks degrades the performance on previously learned tasks. Recent theoretical work has addressed this issue by analysing learning curves in synthetic frameworks under predefined training protocols. However, these protocols relied on heuristics and lacked a solid theoretical foundation assessing their optimality. In this paper, we fill this gap by combining exact equations for training dynamics, derived using statistical physics techniques, with optimal control methods. We apply this approach to teacher-student models for continual learning and multi-task problems, obtaining a theory for task-selection protocols maximising performance while minimising forgetting. Our theoretical analysis offers non-trivial yet interpretable strategies for mitigating catastrophic forgetting, shedding light on how optimal learning protocols modulate established effects, such as the influence of task similarity on forgetting. Finally, we validate our theoretical findings with experiments on real-world data.

Author

Francesco Mori

University of Oxford

Stefano Sarao Mannelli

University of Gothenburg

Data Science and AI 3

University of Witwatersrand

Francesca Mignacco

Princeton University

City University of New York (CUNY)

13th International Conference on Learning Representations Iclr 2025

74265-74287
9798331320850 (ISBN)

13th International Conference on Learning Representations, ICLR 2025
Singapore, Singapore,

Subject Categories (SSIF 2025)

Computer Sciences

DOI

10.48550/arXiv.2409.18061

More information

Latest update

7/18/2025