Type of kernel induced by artificial neural networks
In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.
In general, a kernel is a positive-semidefinite symmetric function of two inputs which represents some notion of similarity between the two inputs. The NTK is a specific kernel derived from a given neural network; in general, when the neural network parameters change during training, the NTK evolves as well. However, in the limit of large layer width the NTK becomes constant, revealing a duality between training the wide neural network and kernel methods: gradient descent in the infinite-width limit is fully equivalent to kernel gradient descent with the NTK. As a result, using gradient descent to minimize least-square loss for neural networks yields the same mean estimator as ridgeless kernel regression with the NTK. This duality enables simple closed form equations describing the training dynamics, generalization, and predictions of wide neural networks.
The NTK was introduced in 2018 by Arthur Jacot, Franck Gabriel and Clément Hongler,[1] who used it to study the convergence and generalization properties of fully connected neural networks. Later works[2][3] extended the NTK results to other neural network architectures. In fact, the phenomenon behind NTK is not specific to neural networks and can be observed in generic nonlinear models, usually by a suitable scaling[4].
^Jacot, Arthur; Gabriel, Franck; Hongler, Clement (2018), Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K. (eds.), "Neural Tangent Kernel: Convergence and Generalization in Neural Networks" (PDF), Advances in Neural Information Processing Systems 31, Curran Associates, Inc., pp. 8571–8580, arXiv:1806.07572, retrieved 2019-11-27
^Arora, Sanjeev; Du, Simon S.; Hu, Wei; Li, Zhiyuan; Salakhutdinov, Ruslan; Wang, Ruosong (2019-11-04). "On Exact Computation with an Infinitely Wide Neural Net". arXiv:1904.11955 [cs.LG].
^Yang, Greg (2020-11-29). "Tensor Programs II: Neural Tangent Kernel for Any Architecture". arXiv:2006.14548 [stat.ML].
^Chizat, Lénaïc; Oyallon, Edouard; Bach, Francis (2019-12-08), "On lazy training in differentiable programming", Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., pp. 2937–2947, arXiv:1812.07956, retrieved 2023-05-11
and 26 Related for: Neural tangent kernel information
of artificial neural networks (ANNs), the neuraltangentkernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during...
architecture and initializations hyper-parameters. The NeuralTangentKernel describes the evolution of neural network predictions during gradient descent training...
Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters (or kernel) optimization...
Finland. (Known as NTK Nakkila), Neuraltangentkernel, a mathematical tool to describe the training of artificial neural networks This disambiguation page...
A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between...
artificial neural networks after random initialization of their parameters, but before training; it appears as a term in neuraltangentkernel prediction...
A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between...
efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces...
multilayer perceptron (MLP) is a name for a modern feedforward artificial neural network, consisting of fully connected neurons with a nonlinear kind of...
Sohl-Dickstein, Jascha; Schoenholz, Samuel S. (2020). "NeuralTangents: Fast and Easy Infinite Neural Networks in Python". International Conference on Learning...
The Tangent loss is quasi-convex and is bounded for large negative values which makes it less sensitive to outliers. Interestingly, the Tangent loss...
stop the neural network from further training. As one example of the problem cause, traditional activation functions such as the hyperbolic tangent function...
kernels of the previous neural network layer while i {\displaystyle i} iterates through the number of kernels of the current layer. In quantum neural...
Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term...
1 {\displaystyle \sup _{x}|h'(x)|\leq 1} . For example, the hyperbolic tangent function h = tanh {\displaystyle h=\tanh } satisfies the requirement. Then...
subalgebra of kernels which can be solved in O ( n ) {\displaystyle O(n)} . neural-tangents is a specialized package for infinitely wide neural networks....
G_{x}\ ,} where d G {\displaystyle \ \operatorname {d} G} denotes the tangent map or Jacobian T M → T R p . {\displaystyle \ TM\to T\mathbb {R} ^{p}~...
plays an important role in Morse theory and catastrophe theory, because its kernel and eigenvalues allow classification of the critical points. The determinant...
{\displaystyle \partial m(u)} being the tangent vector to the curve and K C {\displaystyle K_{\mathcal {C}}} a given matrix kernel of R 3 {\displaystyle {\mathbb...