Global Information Lookup Global Information

Backpropagation information


In machine learning, backpropagation is a gradient estimation method used to train neural network models. The gradient estimate is used by the optimization algorithm to compute the network parameter updates.

It is an efficient application of the Leibniz chain rule (1673)[1] to such networks.[2] It is also known as the reverse mode of automatic differentiation or reverse accumulation, due to Seppo Linnainmaa (1970).[3][4][5][6][7][8][9] The term "back-propagating error correction" was introduced in 1962 by Frank Rosenblatt,[10][2] but he did not know how to implement this, even though Henry J. Kelley had a continuous precursor of backpropagation[11] already in 1960 in the context of control theory.[2]

Backpropagation computes the gradient of a loss function with respect to the weights of the network for a single input–output example, and does so efficiently, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this can be derived through dynamic programming.[11][12][13] Gradient descent, or variants such as stochastic gradient descent,[14] are commonly used.

Strictly the term backpropagation refers only to the algorithm for computing the gradient, not how the gradient is used; but the term is often used loosely to refer to the entire learning algorithm – including how the gradient is used, such as by stochastic gradient descent.[15] In 1986 David E. Rumelhart et al. published an experimental analysis of the technique.[16] This contributed to the popularization of backpropagation and helped to initiate an active period of research in multilayer perceptrons.

  1. ^ Leibniz, Gottfried Wilhelm Freiherr von (1920). The Early Mathematical Manuscripts of Leibniz: Translated from the Latin Texts Published by Carl Immanuel Gerhardt with Critical and Historical Notes (Leibniz published the chain rule in a 1676 memoir). Open court publishing Company. ISBN 9780598818461.
  2. ^ a b c Schmidhuber, Juergen (2022). "Annotated History of Modern AI and Deep Learning". arXiv:2212.11279 [cs.NE].
  3. ^ Linnainmaa, Seppo (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors (Masters) (in Finnish). University of Helsinki. pp. 6–7.
  4. ^ Linnainmaa, Seppo (1976). "Taylor expansion of the accumulated rounding error". BIT Numerical Mathematics. 16 (2): 146–160. doi:10.1007/bf01931367. S2CID 122357351.
  5. ^ Griewank, Andreas (2012). "Who Invented the Reverse Mode of Differentiation?". Optimization Stories. Documenta Matematica, Extra Volume ISMP. pp. 389–400. S2CID 15568746.
  6. ^ Griewank, Andreas; Walther, Andrea (2008). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition. SIAM. ISBN 978-0-89871-776-1.
  7. ^ Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
  8. ^ Schmidhuber, Jürgen (2015). "Deep Learning". Scholarpedia. 10 (11): 32832. Bibcode:2015SchpJ..1032832S. doi:10.4249/scholarpedia.32832.
  9. ^ Goodfellow, Bengio & Courville (2016, p. 217–218), "The back-propagation algorithm described here is only one approach to automatic differentiation. It is a special case of a broader class of techniques called reverse mode accumulation."
  10. ^ Rosenblatt, Frank (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms Cornell Aeronautical Laboratory. Report no. VG-1196-G-8 Report (Cornell Aeronautical Laboratory). Spartan. pp. Page XIII Table of contents, Page 292 "13.3 Back-Propagating Error Correction Procedures", Page 301 "figure 39 BACK-PROPAGATING ERROR-CORRECTION EXPERIMENTS".
  11. ^ a b Kelley, Henry J. (1960). "Gradient theory of optimal flight paths". ARS Journal. 30 (10): 947–954. doi:10.2514/8.5282.
  12. ^ Bryson, Arthur E. (1962). "A gradient method for optimizing multi-stage allocation processes". Proceedings of the Harvard Univ. Symposium on digital computers and their applications, 3–6 April 1961. Cambridge: Harvard University Press. OCLC 498866871.
  13. ^ Goodfellow, Bengio & Courville 2016, p. 214, "This table-filling strategy is sometimes called dynamic programming."
  14. ^ Robbins, H.; Monro, S. (1951). "A Stochastic Approximation Method". The Annals of Mathematical Statistics. 22 (3): 400. doi:10.1214/aoms/1177729586.
  15. ^ Goodfellow, Bengio & Courville 2016, p. 200, "The term back-propagation is often misunderstood as meaning the whole learning algorithm for multilayer neural networks. Backpropagation refers only to the method for computing the gradient, while other algorithms, such as stochastic gradient descent, is used to perform learning using this gradient."
  16. ^ Cite error: The named reference learning-representations was invoked but never defined (see the help page).

and 28 Related for: Backpropagation information

Request time (Page generated in 0.5548 seconds.)

Backpropagation

Last Update:

In machine learning, backpropagation is a gradient estimation method used to train neural network models. The gradient estimate is used by the optimization...

Word Count : 7493

Neural backpropagation

Last Update:

Neural backpropagation is the phenomenon in which, after the action potential of a neuron creates a voltage spike down the axon (normal propagation),...

Word Count : 2262

Backpropagation through time

Last Update:

Backpropagation through time (BPTT) is a gradient-based technique for training certain types of recurrent neural networks. It can be used to train Elman...

Word Count : 750

Seppo Linnainmaa

Last Update:

mathematician and computer scientist known for creating the modern version of backpropagation. He was born in Pori. He received his MSc in 1970 and introduced a...

Word Count : 390

Multilayer perceptron

Last Update:

modern networks). Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks...

Word Count : 1922

Feedforward neural network

Last Update:

bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks...

Word Count : 2320

Backpropagation through structure

Last Update:

Backpropagation through structure (BPTS) is a gradient-based technique for training recursive neural nets (a superset of recurrent neural nets) and is...

Word Count : 83

LeNet

Last Update:

LeNet-5. In 1989, Yann LeCun et al. at Bell Labs first applied the backpropagation algorithm to practical applications, and believed that the ability...

Word Count : 1449

Rprop

Last Update:

Rprop, short for resilient backpropagation, is a learning heuristic for supervised learning in feedforward artificial neural networks. This is a first-order...

Word Count : 513

Geoffrey Hinton

Last Update:

co-author of a highly cited paper published in 1986 that popularized the backpropagation algorithm for training multi-layer neural networks, although they were...

Word Count : 4036

Vanishing gradient problem

Last Update:

training neural networks with gradient-based learning methods and backpropagation. In such methods, during each iteration of training each of the neural...

Word Count : 3779

Deep learning

Last Update:

continuous precursor of backpropagation already in 1960 in the context of control theory. In 1982, Paul Werbos applied backpropagation to MLPs in the way that...

Word Count : 17362

Catastrophic interference

Last Update:

like the standard backpropagation network can generalize to unseen inputs, but they are sensitive to new information. Backpropagation models can be analogized...

Word Count : 4173

Artificial intelligence

Last Update:

each input during training. The most common training technique is the backpropagation algorithm. Neural networks learn to model complex relationships between...

Word Count : 21946

Mathematics of artificial neural networks

Last Update:

Backpropagation training algorithms fall into three categories: steepest descent (with variable learning rate and momentum, resilient backpropagation);...

Word Count : 1790

AlexNet

Last Update:

CNN designs introduced by Yann LeCun et al. (1989) who applied the backpropagation algorithm to a variant of Kunihiko Fukushima's original CNN architecture...

Word Count : 961

David Rumelhart

Last Update:

applications. This paper, however, does not cite earlier work of the backpropagation method, such as the 1974 dissertation of Paul Werbos. In the same year...

Word Count : 885

Paul Werbos

Last Update:

described the process of training artificial neural networks through backpropagation of errors. He also was a pioneer of recurrent neural networks. Werbos...

Word Count : 285

Recurrent neural network

Last Update:

thus neurons are independent of each other's history. The gradient backpropagation can be regulated to avoid gradient vanishing and exploding in order...

Word Count : 8082

Neural network

Last Update:

by modifying these weights through empirical risk minimization or backpropagation in order to fit some preexisting dataset. Neural networks are used...

Word Count : 761

ADALINE

Last Update:

training algorithms for MADALINE networks, which cannot be learned using backpropagation because the sign function is not differentiable, have been suggested...

Word Count : 1034

History of artificial neural networks

Last Update:

"AI winter". Later, advances in hardware and the development of the backpropagation algorithm as well as recurrent neural networks and convolutional neural...

Word Count : 6432

Sigmoid function

Last Update:

"The influence of the sigmoid function parameters on the speed of backpropagation learning". In Mira, José; Sandoval, Francisco (eds.). From Natural...

Word Count : 1688

Variational autoencoder

Last Update:

differentiable loss function in order to update the network weights through backpropagation. For variational autoencoders, the idea is to jointly optimize the...

Word Count : 3158

Recursive neural network

Last Update:

network. The gradient is computed using backpropagation through structure (BPTS), a variant of backpropagation through time used for recurrent neural networks...

Word Count : 954

Batch normalization

Last Update:

Batch normalization (also known as batch norm) is a method used to make training of artificial neural networks faster and more stable through normalization...

Word Count : 5807

Stuart Dreyfus

Last Update:

1962, Dreyfus simplified the Dynamic Programming-based derivation of backpropagation (due to Henry J. Kelley and Arthur E. Bryson) using only the chain...

Word Count : 328

Programming paradigm

Last Update:

(2018), Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K. (eds.), "Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable...

Word Count : 2322

PDF Search Engine © AllGlobal.net