A THEORY OF INITIALISATION'S IMPACT ON SPECIALISATION
Paper i proceeding, 2025

Prior work has demonstrated a consistent tendency in neural networks engaged in continual learning tasks, wherein intermediate task similarity results in the highest levels of catastrophic interference. This phenomenon is attributed to the network's tendency to reuse learned features across tasks. However, this explanation heavily relies on the premise that neuron specialisation occurs, i.e. the emergence of localised representations. Our investigation challenges the validity of this assumption. Using theoretical frameworks for the analysis of neural networks, we show a strong dependence of specialisation on the initial condition. More precisely, we show that weight imbalance and high weight entropy can favour specialised solutions. We then apply these insights in the context of continual learning, first showing the emergence of a monotonic relation between task-similarity and forgetting in non-specialised networks. Finally, we show that specialization by weight imbalance is beneficial on the commonly employed elastic weight consolidation regularisation technique.

Författare

Devon Jarvis

University of Witwatersrand

Sebastian Lee

Flatiron Institute

Clémentine Carla Juliette Dominé

University College London (UCL)

Andrew Saxe

University College London (UCL)

Canadian Institute for Advanced Research

Stefano Sarao Mannelli

University of Witwatersrand

Data Science och AI 3

13th International Conference on Learning Representations Iclr 2025

75098-75126
9798331320850 (ISBN)

13th International Conference on Learning Representations, ICLR 2025
Singapore, Singapore,

Ämneskategorier (SSIF 2025)

Data- och informationsvetenskap (Datateknik)

Mer information

Senast uppdaterat

2025-08-01