A theory of initialisation's impact on specialisation*
Journal article, 2025

Prior work has demonstrated a consistent tendency in neural networks engaged in continual learning tasks, wherein intermediate task similarity results in the highest levels of catastrophic interference. This phenomenon is attributed to the network's tendency to reuse learned features across tasks. However, this explanation heavily relies on the premise that neuron specialisation occurs, i.e. the emergence of localised representations. Our investigation challenges the validity of this assumption. Using theoretical frameworks for the analysis of neural networks, we show a strong dependence of specialisation on the initial condition. More precisely, we show that weight imbalance and high weight entropy can favour specialised solutions. We then apply these insights in the context of continual learning, first showing the emergence of a monotonic relation between task-similarity and forgetting in non-specialised networks. Finally, we show that specialisation by weight imbalance is beneficial on the commonly employed elastic weight consolidation regularisation technique.

analysis of algorithms

online dynamics

deep learning

machine learning

Author

Devon Jarvis

University of Witwatersrand

Sebastian Lee

Flatiron Institute

Clementine Carla Juliette Domine

University College London (UCL)

Andrew M. Saxe

Canadian Institute for Advanced Research

University College London (UCL)

Stefano Sarao Mannelli

Data Science and AI 3

JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT

1742-5468 (ISSN)

Vol. 2025 11 114001

Subject Categories (SSIF 2025)

Natural Language Processing

DOI

10.1088/1742-5468/ae1214

More information

Latest update

11/27/2025