This article is about the smooth approximation of one-hot arg max. For the smooth approximation of max, see LogSumExp.
"Softmax" redirects here. For the Korean video game company, see ESA (company).
Part of a series on
Machine learning and data mining
Paradigms
Supervised learning
Unsupervised learning
Online learning
Batch learning
Meta-learning
Semi-supervised learning
Self-supervised learning
Reinforcement learning
Curriculum learning
Rule-based learning
Quantum machine learning
Problems
Classification
Generative modeling
Regression
Clustering
Dimensionality reduction
Density estimation
Anomaly detection
Data cleaning
AutoML
Association rules
Semantic analysis
Structured prediction
Feature engineering
Feature learning
Learning to rank
Grammar induction
Ontology learning
Multimodal learning
Supervised learning (classification • regression)
Apprenticeship learning
Decision trees
Ensembles
Bagging
Boosting
Random forest
k-NN
Linear regression
Naive Bayes
Artificial neural networks
Logistic regression
Perceptron
Relevance vector machine (RVM)
Support vector machine (SVM)
Clustering
BIRCH
CURE
Hierarchical
k-means
Fuzzy
Expectation–maximization (EM)
DBSCAN
OPTICS
Mean shift
Dimensionality reduction
Factor analysis
CCA
ICA
LDA
NMF
PCA
PGD
t-SNE
SDL
Structured prediction
Graphical models
Bayes net
Conditional random field
Hidden Markov
Anomaly detection
RANSAC
k-NN
Local outlier factor
Isolation forest
Artificial neural network
Autoencoder
Cognitive computing
Deep learning
DeepDream
Feedforward neural network
Kolmogorov–Arnold Network
Recurrent neural network
LSTM
GRU
ESN
reservoir computing
Restricted Boltzmann machine
GAN
Diffusion model
SOM
Convolutional neural network
U-Net
Transformer
Vision
Mamba
Spiking neural network
Memtransistor
Electrochemical RAM (ECRAM)
Reinforcement learning
Q-learning
SARSA
Temporal difference (TD)
Multi-agent
Self-play
Learning with humans
Active learning
Crowdsourcing
Human-in-the-loop
RLHF
Model diagnostics
Coefficient of determination
Confusion matrix
Learning curve
ROC curve
Mathematical foundations
Kernel machines
Bias–variance tradeoff
Computational learning theory
Empirical risk minimization
Occam learning
PAC learning
Statistical learning
VC theory
Machine-learning venues
ECML PKDD
NeurIPS
ICML
ICLR
IJCAI
ML
JMLR
Related articles
Glossary of artificial intelligence
List of datasets for machine-learning research
List of datasets in computer vision and image processing
Outline of machine learning
v
t
e
The softmax function, also known as softargmax[1]: 184 or normalized exponential function,[2]: 198 converts a vector of K real numbers into a probability distribution of K possible outcomes. It is a generalization of the logistic function to multiple dimensions, and used in multinomial logistic regression. The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes.
^Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). "6.2.2.3 Softmax Units for Multinoulli Output Distributions". Deep Learning. MIT Press. pp. 180–184. ISBN 978-0-26203561-3.
^Cite error: The named reference bishop was invoked but never defined (see the help page).
classification the softmax activation is often used. The following table compares the properties of several activation functions that are functions of one fold...
activation function in data analysis Softmaxfunction – Smooth approximation of one-hot arg max Swish function – Mathematical activation function in data...
_{i}}}}} The following function: softmax ( k , x 1 , … , x n ) = e x k ∑ i = 1 n e x i {\displaystyle \operatorname {softmax} (k,x_{1},\ldots ,x_{n})={\frac...
gradient of LogSumExp is the softmaxfunction. The convex conjugate of LogSumExp is the negative entropy. The LSE function is often encountered when the...
x = 0. {\displaystyle x=0.} LogSumExp function, also called softmaxfunction, is a convex function. The function − log det ( X ) {\displaystyle -\log...
_{j}\\12:\quad \mathbf {return} ~\mathbf {v} _{j}\\\end{array}}} At line 8, the softmaxfunction can be replaced by any type of winner-take-all network. Biologically...
activation function) is some predefined function, such as the hyperbolic tangent, sigmoid function, softmaxfunction, or rectifier function. The important...
transition. The softmaxfunction commonly used in machine learning is related to the Boltzmann distribution: ( p 1 , … , p M ) = softmax [ − ε 1 k T ...
with the previous output, represented by attention hidden state. A softmaxfunction is then applied to the attention score to get the attention weight...
{\displaystyle \mu _{i}} is a learnable parameter. The weighting function is a linear-softmaxfunction: w ( x ) i = e k i T x + b i ∑ j e k j T x + b j {\displaystyle...
p k {\displaystyle p_{1},\ldots ,p_{k}} can be recovered using the softmaxfunction, which can then be sampled using the techniques described above. There...
}\to \max } . LogSumExp Softmaxfunction Generalized mean Asadi, Kavosh; Littman, Michael L. (2017). "An Alternative Softmax Operator for Reinforcement...
of probability distributions whose probability density function (or probability mass function, for the case of a discrete distribution) can be expressed...
classification. In practice, the last layer of a neural network is usually a softmaxfunction layer, which is the algebraic simplification of N logistic classifiers...
data set as stochastic nearest neighbours. We define these using a softmaxfunction of the squared Euclidean distance between a given LOO-classification...
[clarification needed] In this case, the logistic function for visible units is replaced by the softmaxfunction P ( v i k = 1 | h ) = exp ( a i k + Σ j W...
exactly the softmaxfunction as in Pr ( Y i = c ) = softmax ( c , β 0 ⋅ X i , β 1 ⋅ X i , … ) . {\displaystyle \Pr(Y_{i}=c)=\operatorname {softmax} (c,{\boldsymbol...
{\displaystyle b+\mathbf {w} ^{\top }x} , or in the multiclass case, the softmaxfunction. Discriminative classifiers have lower asymptotic error than generative...
function from Rn to the interior of the standard ( n − 1 ) {\displaystyle (n-1)} -simplex is the softmaxfunction, or normalized exponential function;...
geometric mean of x {\displaystyle x} . The inverse of this function is also known as the softmaxfunction. The isometric log ratio (ilr) transform is both an...
single-particle distribution function. Also, the force acting on the particles depends directly on the velocity distribution function f. The Boltzmann equation...
calculate the bi-tempered logistic loss, performing better than the softmaxfunction with noisy datasets. Bregman divergence is used in the formulation...
multiple inputs is the softmax activation function, used in multinomial logistic regression. Another application of the logistic function is in the Rasch model...