Global Information Lookup Global Information

Neural tangent kernel information


In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

In general, a kernel is a positive-semidefinite symmetric function of two inputs which represents some notion of similarity between the two inputs. The NTK is a specific kernel derived from a given neural network; in general, when the neural network parameters change during training, the NTK evolves as well. However, in the limit of large layer width the NTK becomes constant, revealing a duality between training the wide neural network and kernel methods: gradient descent in the infinite-width limit is fully equivalent to kernel gradient descent with the NTK. As a result, using gradient descent to minimize least-square loss for neural networks yields the same mean estimator as ridgeless kernel regression with the NTK. This duality enables simple closed form equations describing the training dynamics, generalization, and predictions of wide neural networks.

The NTK was introduced in 2018 by Arthur Jacot, Franck Gabriel and Clément Hongler,[1] who used it to study the convergence and generalization properties of fully connected neural networks. Later works[2][3] extended the NTK results to other neural network architectures. In fact, the phenomenon behind NTK is not specific to neural networks and can be observed in generic nonlinear models, usually by a suitable scaling[4].

  1. ^ Jacot, Arthur; Gabriel, Franck; Hongler, Clement (2018), Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K. (eds.), "Neural Tangent Kernel: Convergence and Generalization in Neural Networks" (PDF), Advances in Neural Information Processing Systems 31, Curran Associates, Inc., pp. 8571–8580, arXiv:1806.07572, retrieved 2019-11-27
  2. ^ Arora, Sanjeev; Du, Simon S.; Hu, Wei; Li, Zhiyuan; Salakhutdinov, Ruslan; Wang, Ruosong (2019-11-04). "On Exact Computation with an Infinitely Wide Neural Net". arXiv:1904.11955 [cs.LG].
  3. ^ Yang, Greg (2020-11-29). "Tensor Programs II: Neural Tangent Kernel for Any Architecture". arXiv:2006.14548 [stat.ML].
  4. ^ Chizat, Lénaïc; Oyallon, Edouard; Bach, Francis (2019-12-08), "On lazy training in differentiable programming", Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA: Curran Associates Inc., pp. 2937–2947, arXiv:1812.07956, retrieved 2023-05-11

and 26 Related for: Neural tangent kernel information

Request time (Page generated in 0.8101 seconds.)

Neural tangent kernel

Last Update:

of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during...

Word Count : 5061

Large width limits of neural networks

Last Update:

architecture and initializations hyper-parameters. The Neural Tangent Kernel describes the evolution of neural network predictions during gradient descent training...

Word Count : 869

Kernel method

Last Update:

Fisher kernel Graph kernels Kernel smoother Polynomial kernel Radial basis function kernel (RBF) String kernels Neural tangent kernel Neural network...

Word Count : 1668

Convolutional neural network

Last Update:

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters (or kernel) optimization...

Word Count : 14846

NTK

Last Update:

Finland. (Known as NTK Nakkila), Neural tangent kernel, a mathematical tool to describe the training of artificial neural networks This disambiguation page...

Word Count : 99

Feedforward neural network

Last Update:

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between...

Word Count : 2320

Neural network Gaussian process

Last Update:

artificial neural networks after random initialization of their parameters, but before training; it appears as a term in neural tangent kernel prediction...

Word Count : 2964

Recurrent neural network

Last Update:

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between...

Word Count : 8082

Support vector machine

Last Update:

efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces...

Word Count : 8897

Multilayer perceptron

Last Update:

multilayer perceptron (MLP) is a name for a modern feedforward artificial neural network, consisting of fully connected neurons with a nonlinear kind of...

Word Count : 1922

Dimensionality reduction

Last Update:

Anouar, F. (2000). "Generalized Discriminant Analysis Using a Kernel Approach". Neural Computation. 12 (10): 2385–2404. CiteSeerX 10.1.1.412.760. doi:10...

Word Count : 2349

Nonlinear dimensionality reduction

Last Update:

Müller, K.-R. (1998). "Nonlinear Component Analysis as a Kernel Eigenvalue Problem". Neural Computation. 10 (5). MIT Press: 1299–1319. doi:10.1162/089976698300017467...

Word Count : 6124

Outline of machine learning

Last Update:

model Kernel adaptive filter Kernel density estimation Kernel eigenvoice Kernel embedding of distributions Kernel method Kernel perceptron Kernel random...

Word Count : 3582

Gaussian process

Last Update:

Sohl-Dickstein, Jascha; Schoenholz, Samuel S. (2020). "Neural Tangents: Fast and Easy Infinite Neural Networks in Python". International Conference on Learning...

Word Count : 5508

Loss functions for classification

Last Update:

The Tangent loss is quasi-convex and is bounded for large negative values which makes it less sensitive to outliers. Interestingly, the Tangent loss...

Word Count : 4159

Vanishing gradient problem

Last Update:

stop the neural network from further training. As one example of the problem cause, traditional activation functions such as the hyperbolic tangent function...

Word Count : 3779

Activation function

Last Update:

kernels of the previous neural network layer while i {\displaystyle i} iterates through the number of kernels of the current layer. In quantum neural...

Word Count : 1644

Gated recurrent unit

Last Update:

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al. The GRU is like a long short-term...

Word Count : 1280

Wasserstein GAN

Last Update:

1 {\displaystyle \sup _{x}|h'(x)|\leq 1} . For example, the hyperbolic tangent function h = tanh {\displaystyle h=\tanh } satisfies the requirement. Then...

Word Count : 2884

Comparison of Gaussian process software

Last Update:

subalgebra of kernels which can be solved in O ( n ) {\displaystyle O(n)} . neural-tangents is a specialized package for infinitely wide neural networks....

Word Count : 1556

Isometry

Last Update:

{\displaystyle \ v,w\ } on   M   {\displaystyle \ M\ } (i.e. sections of the tangent bundle   T M   {\displaystyle \ \mathrm {T} M\ } ),   g ( v , w ) = g ′...

Word Count : 2325

Lagrange multiplier

Last Update:

G_{x}\ ,} where   d ⁡ G {\displaystyle \ \operatorname {d} G} denotes the tangent map or Jacobian   T M → T R p   . {\displaystyle \ TM\to T\mathbb {R} ^{p}~...

Word Count : 7741

Hessian matrix

Last Update:

plays an important role in Morse theory and catastrophe theory, because its kernel and eigenvalues allow classification of the critical points. The determinant...

Word Count : 3408

Computational anatomy

Last Update:

{\displaystyle \partial m(u)} being the tangent vector to the curve and K C {\displaystyle K_{\mathcal {C}}} a given matrix kernel of R 3 {\displaystyle {\mathbb...

Word Count : 16865

Outline of finance

Last Update:

Feasible set Mutual fund separation theorem Separation property (finance) Tangent portfolio Market portfolio Beta (finance) Fama–MacBeth regression Hamada's...

Word Count : 5679

List of theorems

Last Update:

theory) Takens's theorem (dynamical systems) Tameness theorem (3-manifolds) Tangent-secant theorem (geometry) Tarski's indefinability theorem (mathematical...

Word Count : 5996

PDF Search Engine © AllGlobal.net