Method used to make artificial neural networks faster and stable by re-centering and re-scaling
This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details.(December 2021) (Learn how and when to remove this message)
Part of a series on
Machine learning and data mining
Paradigms
Supervised learning
Unsupervised learning
Online learning
Batch learning
Meta-learning
Semi-supervised learning
Self-supervised learning
Reinforcement learning
Curriculum learning
Rule-based learning
Quantum machine learning
Problems
Classification
Generative modeling
Regression
Clustering
Dimensionality reduction
Density estimation
Anomaly detection
Data cleaning
AutoML
Association rules
Semantic analysis
Structured prediction
Feature engineering
Feature learning
Learning to rank
Grammar induction
Ontology learning
Multimodal learning
Supervised learning (classification • regression)
Apprenticeship learning
Decision trees
Ensembles
Bagging
Boosting
Random forest
k-NN
Linear regression
Naive Bayes
Artificial neural networks
Logistic regression
Perceptron
Relevance vector machine (RVM)
Support vector machine (SVM)
Clustering
BIRCH
CURE
Hierarchical
k-means
Fuzzy
Expectation–maximization (EM)
DBSCAN
OPTICS
Mean shift
Dimensionality reduction
Factor analysis
CCA
ICA
LDA
NMF
PCA
PGD
t-SNE
SDL
Structured prediction
Graphical models
Bayes net
Conditional random field
Hidden Markov
Anomaly detection
RANSAC
k-NN
Local outlier factor
Isolation forest
Artificial neural network
Autoencoder
Cognitive computing
Deep learning
DeepDream
Feedforward neural network
Kolmogorov–Arnold Network
Recurrent neural network
LSTM
GRU
ESN
reservoir computing
Restricted Boltzmann machine
GAN
Diffusion model
SOM
Convolutional neural network
U-Net
Transformer
Vision
Mamba
Spiking neural network
Memtransistor
Electrochemical RAM (ECRAM)
Reinforcement learning
Q-learning
SARSA
Temporal difference (TD)
Multi-agent
Self-play
Learning with humans
Active learning
Crowdsourcing
Human-in-the-loop
RLHF
Model diagnostics
Coefficient of determination
Confusion matrix
Learning curve
ROC curve
Mathematical foundations
Kernel machines
Bias–variance tradeoff
Computational learning theory
Empirical risk minimization
Occam learning
PAC learning
Statistical learning
VC theory
Machine-learning venues
ECML PKDD
NeurIPS
ICML
ICLR
IJCAI
ML
JMLR
Related articles
Glossary of artificial intelligence
List of datasets for machine-learning research
List of datasets in computer vision and image processing
Outline of machine learning
v
t
e
Batch normalization (also known as batch norm) is a method used to make training of artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling. It was proposed by Sergey Ioffe and Christian Szegedy in 2015.[1]
While the effect of batch normalization is evident, the reasons behind its effectiveness remain under discussion. It was believed that it can mitigate the problem of internal covariate shift, where parameter initialization and changes in the distribution of the inputs of each layer affect the learning rate of the network.[1] Recently, some scholars have argued that batch normalization does not reduce internal covariate shift, but rather smooths the objective function, which in turn improves the performance.[2] However, at initialization, batch normalization in fact induces severe gradient explosion in deep networks, which is only alleviated by skip connections in residual networks.[3] Others maintain that batch normalization achieves length-direction decoupling, and thereby accelerates neural networks.[4]
^ abIoffe, Sergey; Szegedy, Christian (2015). "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". arXiv:1502.03167 [cs.LG].
^Santurkar, Shibani; Tsipras, Dimitris; Ilyas, Andrew; Madry, Aleksander (29 May 2018). "How Does Batch Normalization Help Optimization?". arXiv:1805.11604 [stat.ML].
^Yang, Greg; Pennington, Jeffrey; Rao, Vinay; Sohl-Dickstein, Jascha; Schoenholz, Samuel S. (2019). "A Mean Field Theory of Batch Normalization". arXiv:1902.08129 [cs.NE].
^Kohler, Jonas; Daneshmand, Hadi; Lucchi, Aurelien; Zhou, Ming; Neymeyr, Klaus; Hofmann, Thomas (27 May 2018). "Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization". arXiv:1805.10694 [stat.ML].
and 25 Related for: Batch normalization information
Batchnormalization (also known as batch norm) is a method used to make training of artificial neural networks faster and more stable through normalization...
method used to normalize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally...
Figures 6. To overcome this problem, several methods were proposed. Batchnormalization is a standard method for solving both the exploding and the vanishing...
top-level Internet domain BN-reactor, a Russian nuclear reactor class Batchnormalization, in artificial intelligence Benzyl functional group (Bn), in organic...
networks. It supports other common utility layers like dropout, batchnormalization, and pooling. Keras allows users to produce deep models on smartphones...
interlaced with activation functions and normalization operations (e.g., batchnormalization or layer normalization). As a whole, one of these subnetworks...
inputs that are zero mean/unit variance. Batchnormalization was introduced in a 2015 paper. It is used to normalize the input layer by adjusting and scaling...
altogether, replaces tanh with the ReLU activation, and applies batchnormalization (BN): z t = σ ( BN ( W z x t ) + U z h t − 1 ) h ~ t = ReLU (...
through using more sophisticated means of doing data normalization, rather than batchnormalization. The way the statistical local outputs are pooled and...
Glass batch calculation or glass batching is used to determine the correct mix of raw materials (batch) for a glass melt. The raw materials mixture for...
Here, a 27-layer network is used with multiple convolution layers, batchnormalization, and ReLU activations. It uses a standard ADAM optimizer. The CNN...
that avoid the calculation of the full normalization factor. These include methods that restrict the normalization sum to a sample of outcomes (e.g. Importance...
tissue factor they manufacture. The ISI value indicates how a particular batch of tissue factor compares to an international reference tissue factor. The...
204–205. ISBN 0-89874-318-4. Retrieved 2016-01-03. (NB. At least some batches of this reprint edition were misprints with defective pages 115–146.) Forsythe...
steps in the analysis of RNA-seq data is data normalization. DESeq2 employs the "size factor" normalization method, which adjusts for differences in sequencing...
other computer processing models, such as batch processing, time-sharing, and real-time processing. Batch processing is execution of a series of programs...
and considering various normalization methods, the committee arrived at the following formula for calculating the normalized marks, for CE, CS, EC, EE...
cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting or a data quality firewall. After cleansing...
schematically in Figure 1. For training, the researchers used as input batches of about 1800 images in which for each identity there were about 40 similar...
multi-class classification techniques can be classified into batch learning and online learning. Batch learning algorithms require all the data samples to be...
Chinese people" in different eras also point to different groups. The first batch of "old friends", represented by Edgar Snow and Ma Haide, were foreigners...
include: Desiccated thyroid preparations have a greater variability from batch to batch than synthetic ones. Desiccated thyroid has roughly a 4:1 ratio of thyroxine...
John R. Lausch Jr. to conduct an investigation. On December 20, a second batch of classified documents was discovered in the garage of Biden's Wilmington...