Global Information Lookup Global Information

Universal approximation theorem information


Artificial neural networks are combinations of multiple simple mathematical functions that implement more complicated functions from (typically) real-valued vectors to real-valued vectors. The spaces of multivariate functions that can be implemented by a network are determined by the structure of the network, the set of simple functions, and its multiplicative parameters. A great deal of theoretical work has gone into characterizing these function spaces.

In the mathematical theory of artificial neural networks, universal approximation theorems are results[1][2] that put limits on what neural networks can theoretically learn. Specifically, given an algorithm that generates the networks within a class of functions, the theorems establish the density of the generated functions within a given function space of interest. Typically, these results concern the approximation capabilities of the feedforward architecture on the space of continuous functions between two Euclidean spaces, and the approximation is with respect to the compact convergence topology. What must be stressed, is that while some functions can be arbitrarily well approximated in a region, the proofs do not apply outside of the region, i.e. the approximated functions do not extrapolate outside of the region. That applies for all non-periodic activation functions, i.e. what's in practice used and most proofs assume. In recent years neocortical pyramidal neurons with oscillating activation function that can individually learn the XOR function have been discovered in the human brain and oscillating activation functions have been explored and shown to outperform popular activation functions on a variety of benchmarks.[3]

However, there are also a variety of results between non-Euclidean spaces[4] and other commonly used architectures and, more generally, algorithmically generated sets of functions, such as the convolutional neural network (CNN) architecture,[5][6] radial basis functions,[7] or neural networks with specific properties.[8][9] Most universal approximation theorems can be parsed into two classes. The first quantifies the approximation capabilities of neural networks with an arbitrary number of artificial neurons ("arbitrary width" case) and the second focuses on the case with an arbitrary number of hidden layers, each containing a limited number of artificial neurons ("arbitrary depth" case). In addition to these two classes, there are also universal approximation theorems for neural networks with bounded number of hidden layers and a limited number of neurons in each layer ("bounded depth and bounded width" case).

Universal approximation theorems imply that neural networks can represent a wide variety of interesting functions with appropriate weights. On the other hand, they typically do not provide a construction for the weights, but merely state that such a construction is possible. To construct the weight, neural networks are trained, and they may converge on the correct weights, or not (i.e. get stuck in a local optimum). If the network is too small (for the dimensions of input data) then the universal approximation theorems do not apply, i.e. the networks will not learn. What was once proven about the depth of a network, i.e. a single hidden layer enough, only applies for one dimension, in general such a network is too shallow. The width of a network is also an important hyperparameter. The choice of an activation function is also important, and some work, and proofs written about, assume e.g. ReLU (or sigmoid) used, while some, such as a linear are known to not work (nor any polynominal).

Neural networks with an unbounded (non-polynomial) activation function have the universal approximation property.[10]

The universal approximation property of width-bounded networks has been studied as a dual of classical universal approximation results on depth-bounded networks. For input dimension dx and output dimension dy the minimum width required for the universal approximation of the Lp functions is exactly max{dx + 1, dy} (for a ReLU network). More generally this also holds if both ReLU and a threshold activation function are used.[11]

  1. ^ Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert (January 1989). "Multilayer feedforward networks are universal approximators". Neural Networks. 2 (5): 359–366. doi:10.1016/0893-6080(89)90020-8.
  2. ^ Balázs Csanád Csáji (2001) Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary
  3. ^ Gidon, Albert; Zolnik, Timothy Adam; Fidzinski, Pawel; Bolduan, Felix; Papoutsi, Athanasia; Poirazi, Panayiota; Holtkamp, Martin; Vida, Imre; Larkum, Matthew Evan (3 January 2020). "Dendritic action potentials and computation in human layer 2/3 cortical neurons". Science. 367 (6473): 83–87. Bibcode:2020Sci...367...83G. doi:10.1126/science.aax6239. PMID 31896716.
  4. ^ Kratsios, Anastasis; Bilokopytov, Eugene (2020). Non-Euclidean Universal Approximation (PDF). Advances in Neural Information Processing Systems. Vol. 33. Curran Associates.
  5. ^ Zhou, Ding-Xuan (2020). "Universality of deep convolutional neural networks". Applied and Computational Harmonic Analysis. 48 (2): 787–794. arXiv:1805.10769. doi:10.1016/j.acha.2019.06.004. S2CID 44113176.
  6. ^ Heinecke, Andreas; Ho, Jinn; Hwang, Wen-Liang (2020). "Refinement and Universal Approximation via Sparsely Connected ReLU Convolution Nets". IEEE Signal Processing Letters. 27: 1175–1179. Bibcode:2020ISPL...27.1175H. doi:10.1109/LSP.2020.3005051. S2CID 220669183.
  7. ^ Park, J.; Sandberg, I. W. (1991). "Universal Approximation Using Radial-Basis-Function Networks". Neural Computation. 3 (2): 246–257. doi:10.1162/neco.1991.3.2.246. PMID 31167308. S2CID 34868087.
  8. ^ Yarotsky, Dmitry (2021). "Universal Approximations of Invariant Maps by Neural Networks". Constructive Approximation. 55: 407–474. arXiv:1804.10306. doi:10.1007/s00365-021-09546-1. S2CID 13745401.
  9. ^ Zakwan, Muhammad; d’Angelo, Massimiliano; Ferrari-Trecate, Giancarlo (2023). "Universal Approximation Property of Hamiltonian Deep Neural Networks". IEEE Control Systems Letters: 1. arXiv:2303.12147. doi:10.1109/LCSYS.2023.3288350. S2CID 257663609.
  10. ^ Sonoda, Sho; Murata, Noboru (September 2017). "Neural Network with Unbounded Activation Functions is Universal Approximator". Applied and Computational Harmonic Analysis. 43 (2): 233–268. arXiv:1505.03654. doi:10.1016/j.acha.2015.12.005. S2CID 12149203.
  11. ^ Cite error: The named reference park was invoked but never defined (see the help page).

and 20 Related for: Universal approximation theorem information

Request time (Page generated in 0.839 seconds.)

Universal approximation theorem

Last Update:

In the mathematical theory of artificial neural networks, universal approximation theorems are results that put limits on what neural networks can theoretically...

Word Count : 5026

Perceptron

Last Update:

{\displaystyle k} input units. Theorem. (Theorem 3.1.1): The parity function is conjuctively local of order n {\displaystyle n} . Theorem. (Section 5.5): The connectedness...

Word Count : 5871

Deep learning

Last Update:

interpreted in terms of the universal approximation theorem or probabilistic inference. The classic universal approximation theorem concerns the capacity of...

Word Count : 17448

List of theorems

Last Update:

geometry) Universal approximation theorem (artificial neural networks) Universal coefficient theorem (algebraic topology) Unmixedness theorem (algebraic...

Word Count : 5996

Neural operators

Last Update:

architecture to finite-dimensional neural networks, similar universal approximation theorems have been proven for neural operators. In particular, it has...

Word Count : 2039

Activation function

Last Update:

neural network can be proven to be a universal function approximator. This is known as the Universal Approximation Theorem. The identity activation function...

Word Count : 1644

Central limit theorem

Last Update:

of this theorem, that the normal distribution may be used as an approximation to the binomial distribution, is the de Moivre–Laplace theorem. Let { X...

Word Count : 8890

Density functional theory

Last Update:

enough for calculations in quantum chemistry until the 1990s, when the approximations used in the theory were greatly refined to better model the exchange...

Word Count : 10545

PCP theorem

Last Update:

K letters of that proof. The PCP theorem is the cornerstone of the theory of computational hardness of approximation, which investigates the inherent...

Word Count : 1751

Artificial intelligence

Last Update:

 467–474), Nilsson (1998, chpt. 3.3) Universal approximation theorem: Russell & Norvig (2021, p. 752) The theorem: Cybenko (1988) Hornik, Stinchcombe &...

Word Count : 22027

George Cybenko

Last Update:

and infrastructure protection. He is known for proving the universal approximation theorem for artificial neural networks with sigmoid activation functions...

Word Count : 364

Decision boundary

Last Update:

continuous function on compact subsets of Rn as shown by the universal approximation theorem, thus it can have an arbitrary decision boundary. In particular...

Word Count : 556

History of artificial intelligence

Last Update:

a deep graph with many processing layers. According to the Universal approximation theorem, deep-ness isn't necessary for a neural network to be able...

Word Count : 15569

Timeline of machine learning

Last Update:

Ronald J. Williams to learn internal representations. 1988 Universal approximation theorem Kurt Hornik [de] proves that standard multilayer feedforward...

Word Count : 1484

University of Illinois Center for Supercomputing Research and Development

Last Update:

As a result, Cybenko’s result has been often called the “Universal Approximation Theorem” in the literature. The proof of that result relied on advanced...

Word Count : 6992

Entropy estimation

Last Update:

differentiable activation functions, such that the conditions for the universal approximation theorem holds. It is shown that this method provides a strongly consistent...

Word Count : 1407

Numerical integration

Last Update:

from the approximation. An important part of the analysis of any numerical integration method is to study the behavior of the approximation error as a...

Word Count : 3246

Glossary of artificial intelligence

Last Update:

continuous function on compact subsets of Rn as shown by the Universal approximation theorem, thus it can have an arbitrary decision boundary. decision...

Word Count : 27514

Singular value decomposition

Last Update:

applications of the SVD include computing the pseudoinverse, matrix approximation, and determining the rank, range, and null space of a matrix. The SVD...

Word Count : 13747

Universal Taylor series

Last Update:

infinitely many times (use the diagonal enumeration). By Weierstrass approximation theorem, it is dense in C [ − 1 , 1 ] 0 {\displaystyle C[-1,1]_{0}} . Thus...

Word Count : 568

PDF Search Engine © AllGlobal.net