Universal approximation theorem information

Artificial neural networks are combinations of multiple simple mathematical functions that implement more complicated functions from (typically) real-valued vectors to real-valued vectors. The spaces of multivariate functions that can be implemented by a network are determined by the structure of the network, the set of simple functions, and its multiplicative parameters. A great deal of theoretical work has gone into characterizing these function spaces.

In the mathematical theory of artificial neural networks, universal approximation theorems are results^[1]^[2] that put limits on what neural networks can theoretically learn. Specifically, given an algorithm that generates the networks within a class of functions, the theorems establish the density of the generated functions within a given function space of interest. Typically, these results concern the approximation capabilities of the feedforward architecture on the space of continuous functions between two Euclidean spaces, and the approximation is with respect to the compact convergence topology. What must be stressed, is that while some functions can be arbitrarily well approximated in a region, the proofs do not apply outside of the region, i.e. the approximated functions do not extrapolate outside of the region. That applies for all non-periodic activation functions, i.e. what's in practice used and most proofs assume. In recent years neocortical pyramidal neurons with oscillating activation function that can individually learn the XOR function have been discovered in the human brain and oscillating activation functions have been explored and shown to outperform popular activation functions on a variety of benchmarks.^[3]

However, there are also a variety of results between non-Euclidean spaces^[4] and other commonly used architectures and, more generally, algorithmically generated sets of functions, such as the convolutional neural network (CNN) architecture,^[5]^[6] radial basis functions,^[7] or neural networks with specific properties.^[8]^[9] Most universal approximation theorems can be parsed into two classes. The first quantifies the approximation capabilities of neural networks with an arbitrary number of artificial neurons ("arbitrary width" case) and the second focuses on the case with an arbitrary number of hidden layers, each containing a limited number of artificial neurons ("arbitrary depth" case). In addition to these two classes, there are also universal approximation theorems for neural networks with bounded number of hidden layers and a limited number of neurons in each layer ("bounded depth and bounded width" case).

Universal approximation theorems imply that neural networks can represent a wide variety of interesting functions with appropriate weights. On the other hand, they typically do not provide a construction for the weights, but merely state that such a construction is possible. To construct the weight, neural networks are trained, and they may converge on the correct weights, or not (i.e. get stuck in a local optimum). If the network is too small (for the dimensions of input data) then the universal approximation theorems do not apply, i.e. the networks will not learn. What was once proven about the depth of a network, i.e. a single hidden layer enough, only applies for one dimension, in general such a network is too shallow. The width of a network is also an important hyperparameter. The choice of an activation function is also important, and some work, and proofs written about, assume e.g. ReLU (or sigmoid) used, while some, such as a linear are known to not work (nor any polynominal).

Neural networks with an unbounded (non-polynomial) activation function have the universal approximation property.^[10]

The universal approximation property of width-bounded networks has been studied as a dual of classical universal approximation results on depth-bounded networks. For input dimension dx and output dimension dy the minimum width required for the universal approximation of the L^p functions is exactly max{dx + 1, dy} (for a ReLU network). More generally this also holds if both ReLU and a threshold activation function are used.^[11]

^ Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert (January 1989). "Multilayer feedforward networks are universal approximators". Neural Networks. 2 (5): 359–366. doi:10.1016/0893-6080(89)90020-8.
^ Balázs Csanád Csáji (2001) Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary
^ Gidon, Albert; Zolnik, Timothy Adam; Fidzinski, Pawel; Bolduan, Felix; Papoutsi, Athanasia; Poirazi, Panayiota; Holtkamp, Martin; Vida, Imre; Larkum, Matthew Evan (3 January 2020). "Dendritic action potentials and computation in human layer 2/3 cortical neurons". Science. 367 (6473): 83–87. Bibcode:2020Sci...367...83G. doi:10.1126/science.aax6239. PMID 31896716.
^ Kratsios, Anastasis; Bilokopytov, Eugene (2020). Non-Euclidean Universal Approximation (PDF). Advances in Neural Information Processing Systems. Vol. 33. Curran Associates.
^ Zhou, Ding-Xuan (2020). "Universality of deep convolutional neural networks". Applied and Computational Harmonic Analysis. 48 (2): 787–794. arXiv:1805.10769. doi:10.1016/j.acha.2019.06.004. S2CID 44113176.
^ Heinecke, Andreas; Ho, Jinn; Hwang, Wen-Liang (2020). "Refinement and Universal Approximation via Sparsely Connected ReLU Convolution Nets". IEEE Signal Processing Letters. 27: 1175–1179. Bibcode:2020ISPL...27.1175H. doi:10.1109/LSP.2020.3005051. S2CID 220669183.
^ Park, J.; Sandberg, I. W. (1991). "Universal Approximation Using Radial-Basis-Function Networks". Neural Computation. 3 (2): 246–257. doi:10.1162/neco.1991.3.2.246. PMID 31167308. S2CID 34868087.
^ Yarotsky, Dmitry (2021). "Universal Approximations of Invariant Maps by Neural Networks". Constructive Approximation. 55: 407–474. arXiv:1804.10306. doi:10.1007/s00365-021-09546-1. S2CID 13745401.
^ Zakwan, Muhammad; d’Angelo, Massimiliano; Ferrari-Trecate, Giancarlo (2023). "Universal Approximation Property of Hamiltonian Deep Neural Networks". IEEE Control Systems Letters: 1. arXiv:2303.12147. doi:10.1109/LCSYS.2023.3288350. S2CID 257663609.
^ Sonoda, Sho; Murata, Noboru (September 2017). "Neural Network with Unbounded Activation Functions is Universal Approximator". Applied and Computational Harmonic Analysis. 43 (2): 233–268. arXiv:1505.03654. doi:10.1016/j.acha.2015.12.005. S2CID 12149203.
^ Cite error: The named reference park was invoked but never defined (see the help page).

[MLP-UA-1] Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert (January 1989). "Multilayer feedforward networks are universal approximators". Neural Networks. 2 (5): 359–366. doi:10.1016/0893-6080(89)90020-8.

[2] Balázs Csanád Csáji (2001) Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary

[3] Gidon, Albert; Zolnik, Timothy Adam; Fidzinski, Pawel; Bolduan, Felix; Papoutsi, Athanasia; Poirazi, Panayiota; Holtkamp, Martin; Vida, Imre; Larkum, Matthew Evan (3 January 2020). "Dendritic action potentials and computation in human layer 2/3 cortical neurons". Science. 367 (6473): 83–87. Bibcode:2020Sci...367...83G. doi:10.1126/science.aax6239. PMID 31896716.

[NonEuclidean-4] Kratsios, Anastasis; Bilokopytov, Eugene (2020). Non-Euclidean Universal Approximation (PDF). Advances in Neural Information Processing Systems. Vol. 33. Curran Associates.

[5] Zhou, Ding-Xuan (2020). "Universality of deep convolutional neural networks". Applied and Computational Harmonic Analysis. 48 (2): 787–794. arXiv:1805.10769. doi:10.1016/j.acha.2019.06.004. S2CID 44113176.

[6] Heinecke, Andreas; Ho, Jinn; Hwang, Wen-Liang (2020). "Refinement and Universal Approximation via Sparsely Connected ReLU Convolution Nets". IEEE Signal Processing Letters. 27: 1175–1179. Bibcode:2020ISPL...27.1175H. doi:10.1109/LSP.2020.3005051. S2CID 220669183.

[7] Park, J.; Sandberg, I. W. (1991). "Universal Approximation Using Radial-Basis-Function Networks". Neural Computation. 3 (2): 246–257. doi:10.1162/neco.1991.3.2.246. PMID 31167308. S2CID 34868087.

[8] Yarotsky, Dmitry (2021). "Universal Approximations of Invariant Maps by Neural Networks". Constructive Approximation. 55: 407–474. arXiv:1804.10306. doi:10.1007/s00365-021-09546-1. S2CID 13745401.

[9] Zakwan, Muhammad; d’Angelo, Massimiliano; Ferrari-Trecate, Giancarlo (2023). "Universal Approximation Property of Hamiltonian Deep Neural Networks". IEEE Control Systems Letters: 1. arXiv:2303.12147. doi:10.1109/LCSYS.2023.3288350. S2CID 257663609.

[10] Sonoda, Sho; Murata, Noboru (September 2017). "Neural Network with Unbounded Activation Functions is Universal Approximator". Applied and Computational Harmonic Analysis. 43 (2): 233–268. arXiv:1505.03654. doi:10.1016/j.acha.2015.12.005. S2CID 12149203.

[park-11] Cite error: The named reference park was invoked but never defined (see the help page).

Universal approximation theorem information

and 20 Related for: Universal approximation theorem information

Universal approximation theorem

Perceptron

Deep learning

List of theorems

Neural operators

Activation function

Central limit theorem

Density functional theory

PCP theorem

Artificial intelligence

George Cybenko

Decision boundary

History of artificial intelligence

Timeline of machine learning

University of Illinois Center for Supercomputing Research and Development

Entropy estimation

Numerical integration

Glossary of artificial intelligence

Singular value decomposition

Universal Taylor series