Local Learning Rules for Deep Neural Networks with Two-State Neurons
Doktorsavhandling, 2025

The way artificial neural networks are trained with backpropagation requires a degree of synchronization of operations, and non-local knowledge of the computational graph of the network, which is infeasible in noisy asynchronous circuitry (be it biological, analog electronic or optical). Learning algorithms based on temporal or spatial neural activity differences allow estimating gradients, and hence learning, without these problematic requirements. In this thesis, we explore a number of such alternative learning algorithms. Paper A presents a variation of contrastive Hebbian learning, which achieves Lipschitz-1 hidden layers by construction. Paper B focuses on efficient training on traditional digital hardware by presenting a variant of backpropagation compatible with quantized weights. Paper C returns to the topic of contrastive Hebbian learning by presenting a new local learning algorithm for training feedforward networks based on neurons possessing two internal states. These dyadic neurons perform credit assignment by encoding errors as differences and predictions as averages of the internal states. Paper D provides a new variation of dual propagation and provides derivations of both the original and the new variant. Paper E presents a general framework for dyadic learning, which encompasses dual propagation in feedforward models and equilibrium propagation (a well-known variant of contrastive Hebbian learning) on Hopfield models as special cases while also being applicable to arbitrarily connected networks. The case of a skew-symmetric Hopfield network is found to be particularly intriguing as it, like the model from paper A, provides Lipschitz-1 layers by construction.

Contrastive Hebbian learning

biologically inspired learning

lifted neural networks

artificial intelligence

Hopfield networks

local learning

quantized training

EDIT-EA Lecture Hall
Opponent: Associate professor Pawel Herman, KTH

Författare

Rasmus Kjær Høier

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Two Tales of Single-Phase Contrastive Hebbian Learning

Proceedings of Machine Learning Research,;Vol. 235(2024)p. 18470-18488

Paper i proceeding

Dual Propagation: Accelerating Contrastive Hebbian Learning with Dyadic Neurons

Proceedings of Machine Learning Research,;Vol. 202(2023)p. 13141-13156

Paper i proceeding

AdaSTE: An Adaptive Straight-Through Estimator to Train Binary Neural Networks

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,;Vol. 2022-June(2022)p. 460-469

Paper i proceeding

Lifted Regression/Reconstruction Networks

31st British Machine Vision Conference, BMVC 2020,;(2020)

Paper i proceeding

Dyadic Learning in Recurrent and Feedforward Models

NeurIPS 2024 Workshop Machine Learning with new Compute Paradigms,;(2024)

Paper i proceeding

Artificiella neurala nätverk kan ses som parametriserade matematiska funktioner som gör förutsägelser baserat på data. Under träningen justeras parametrarna för att optimera en kostnadsfunktion (ett mått på hur korrekta förutsägelserna är). Vi vet från infinitesimalkalkyl att vi kan beräkna riktningen för den brantaste nedstigningen av kostnadsfunktionen med avseende på parametrarna genom att upprepade gånger använda kedjeregeln. I kontexten av träning av neurala nätverk kallas sådan upprepad användning av kedjeregeln "backpropagation of errors". Tillsammans med innovationer inom högt paralleliserade digitala processorer, som till exempel "grafiska processorer", har backpropagation varit en stor drivkraft bakom framgången för artificiella intelligens. I takt med att artificiella neurala nätverk blir större och mycket mer utbredda, finns det en växande anledning att utforska energieffektiv hårdvara. Backpropagation kräver dock en hög grad av synkronisering av operationer, vilket är möjligt på traditionella digitala datorer, men omöjligt på många av de energieffektiv, alternativ hårdvara (såsom analoga och fotoniska datorer). Detta innebär att dessa typer av hårdvara kommer att kräva alternativa inlärningsalgoritmer, som är kompatibla med en mindre grad av synkronisering. Denna avhandling föreslår nya algoritmer som uppfyller denna begränsning och utforskar deras robusthet.

Artificial neural networks can be thought of as parametrized mathematical functions which make predictions based on input data. During training, the parameters are tweaked in order to optimize a loss function (a measure of how correct the predictions are). We know from calculus that we can compute the direction of steepest descent of the loss with respect to the parameters by repeatedly applying the chain rule. In the context of neural network training, such repeated use of the chain rule is called backpropagation of errors. Together with innovations in massively parallel digital processors such as graphical processing units, backpropagation has been a major driving force behind the success of AI. With artificial neural networks becoming bigger and far more widespread, there is a growing impetus to explore energy-efficient computing hardware. However, backpropagation requires a degree of synchronization of operations, which is achievable on traditional digital computers but infeasible on many types of proposed energy-efficient hardware (such as analog and photonic computing). This means that these types of hardware will require alternative learning algorithms, which are compatible with a lower degree of synchronization. This thesis proposes new algorithms that satisfy this constraint and explores their robustness.

Ämneskategorier (SSIF 2025)

Datavetenskap (datalogi)

Signalbehandling

Artificiell intelligens

ISBN

978-91-8103-176-8

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5634

Utgivare

Chalmers

EDIT-EA Lecture Hall

Opponent: Associate professor Pawel Herman, KTH

Mer information

Senast uppdaterat

2025-02-17