Neural tangent kernel information

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

In general, a kernel is a positive-semidefinite symmetric function of two inputs which represents some notion of similarity between the two inputs. The NTK is a specific kernel derived from a given neural network; in general, when the neural network parameters change during training, the NTK evolves as well. However, in the limit of large layer width the NTK becomes constant, revealing a duality between training the wide neural network and kernel methods: gradient descent in the infinite-width limit is fully equivalent to kernel gradient descent with the NTK. As a result, using gradient descent to minimize least-square loss for neural networks yields the same mean estimator as ridgeless kernel regression with the NTK. This duality enables simple closed form equations describing the training dynamics, generalization, and predictions of wide neural networks.

The NTK was introduced in 2018 by Arthur Jacot, Franck Gabriel and Clément Hongler,^[1] who used it to study the convergence and generalization properties of fully connected neural networks. Later works^[2]^[3] extended the NTK results to other neural network architectures. In fact, the phenomenon behind NTK is not specific to neural networks and can be observed in generic nonlinear models, usually by a suitable scaling^[4].

^ Jacot, Arthur; Gabriel, Franck; Hongler, Clement (2018), Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K. (eds.), "Neural Tangent Kernel: Convergence and Generalization in Neural Networks" (PDF), Advances in Neural Information Processing Systems 31, Curran Associates, Inc., pp. 8571–8580, arXiv:1806.07572, retrieved 2019-11-27
^ Arora, Sanjeev; Du, Simon S.; Hu, Wei; Li, Zhiyuan; Salakhutdinov, Ruslan; Wang, Ruosong (2019-11-04). "On Exact Computation with an Infinitely Wide Neural Net". arXiv:1904.11955 [cs.LG].
^ Yang, Greg (2020-11-29). "Tensor Programs II: Neural Tangent Kernel for Any Architecture". arXiv:2006.14548 [stat.ML].
^ Chizat, Lénaïc; Oyallon, Edouard; Bach, Francis (2019-12-08), "On lazy training in differentiable programming", Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., pp. 2937–2947, arXiv:1812.07956, retrieved 2023-05-11

[:0-1] Jacot, Arthur; Gabriel, Franck; Hongler, Clement (2018), Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K. (eds.), "Neural Tangent Kernel: Convergence and Generalization in Neural Networks" (PDF), Advances in Neural Information Processing Systems 31, Curran Associates, Inc., pp. 8571–8580, arXiv:1806.07572, retrieved 2019-11-27

[:3-2] Arora, Sanjeev; Du, Simon S.; Hu, Wei; Li, Zhiyuan; Salakhutdinov, Ruslan; Wang, Ruosong (2019-11-04). "On Exact Computation with an Infinitely Wide Neural Net". arXiv:1904.11955 [cs.LG].

[3] Yang, Greg (2020-11-29). "Tensor Programs II: Neural Tangent Kernel for Any Architecture". arXiv:2006.14548 [stat.ML].

[4] Chizat, Lénaïc; Oyallon, Edouard; Bach, Francis (2019-12-08), "On lazy training in differentiable programming", Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., pp. 2937–2947, arXiv:1812.07956, retrieved 2023-05-11

Neural tangent kernel information

and 26 Related for: Neural tangent kernel information

Neural tangent kernel

Large width limits of neural networks

Kernel method

Convolutional neural network

NTK

Feedforward neural network

Neural network Gaussian process

Recurrent neural network

Support vector machine

Multilayer perceptron

Dimensionality reduction

Nonlinear dimensionality reduction

Outline of machine learning

Gaussian process

Loss functions for classification

Vanishing gradient problem

Activation function

Gated recurrent unit

Wasserstein GAN

Comparison of Gaussian process software

Isometry

Lagrange multiplier

Hessian matrix

Computational anatomy

Outline of finance

List of theorems