Global Information Lookup Global Information

Loss functions for classification information


Bayes consistent loss functions: Zero-one loss (gray), Savage loss (green), Logistic loss (orange), Exponential loss (purple), Tangent loss (brown), Square loss (blue)

In machine learning and mathematical optimization, loss functions for classification are computationally feasible loss functions representing the price paid for inaccuracy of predictions in classification problems (problems of identifying which category a particular observation belongs to).[1] Given as the space of all possible inputs (usually ), and as the set of labels (possible outputs), a typical goal of classification algorithms is to find a function which best predicts a label for a given input .[2] However, because of incomplete information, noise in the measurement, or probabilistic components in the underlying process, it is possible for the same to generate different .[3] As a result, the goal of the learning problem is to minimize expected loss (also known as the risk), defined as

where is a given loss function, and is the probability density function of the process that generated the data, which can equivalently be written as

Within classification, several commonly used loss functions are written solely in terms of the product of the true label and the predicted label . Therefore, they can be defined as functions of only one variable , so that with a suitably chosen function . These are called margin-based loss functions. Choosing a margin-based loss function amounts to choosing . Selection of a loss function within this framework impacts the optimal which minimizes the expected risk, see empirical risk minimization.

In the case of binary classification, it is possible to simplify the calculation of expected risk from the integral specified above. Specifically,

The second equality follows from the properties described above. The third equality follows from the fact that 1 and −1 are the only possible values for , and the fourth because . The term within brackets is known as the conditional risk.

One can solve for the minimizer of by taking the functional derivative of the last equality with respect to and setting the derivative equal to 0. This will result in the following equation

[citation needed][clarification needed]

which is also equivalent to setting the derivative of the conditional risk equal to zero.

Given the binary nature of classification, a natural selection for a loss function (assuming equal cost for false positives and false negatives) would be the 0-1 loss function (0–1 indicator function), which takes the value of 0 if the predicted classification equals that of the true class or a 1 if the predicted classification does not match the true class. This selection is modeled by

where indicates the Heaviside step function. However, this loss function is non-convex and non-smooth, and solving for the optimal solution is an NP-hard combinatorial optimization problem.[4] As a result, it is better to substitute loss function surrogates which are tractable for commonly used learning algorithms, as they have convenient properties such as being convex and smooth. In addition to their computational tractability, one can show that the solutions to the learning problem using these loss surrogates allow for the recovery of the actual solution to the original classification problem.[5] Some of these surrogates are described below.

In practice, the probability distribution is unknown. Consequently, utilizing a training set of independently and identically distributed sample points

drawn from the data sample space, one seeks to minimize empirical risk

as a proxy for expected risk.[3] (See statistical learning theory for a more detailed description.)

  1. ^ Rosasco, L.; De Vito, E. D.; Caponnetto, A.; Piana, M.; Verri, A. (2004). "Are Loss Functions All the Same?" (PDF). Neural Computation. 16 (5): 1063–1076. CiteSeerX 10.1.1.109.6786. doi:10.1162/089976604773135104. PMID 15070510. S2CID 11845688.
  2. ^ Shen, Yi (2005), Loss Functions For Binary Classification and Class Probability Estimation (PDF), University of Pennsylvania, retrieved 6 December 2014
  3. ^ a b Rosasco, Lorenzo; Poggio, Tomaso (2014), A Regularization Tour of Machine Learning, MIT-9.520 Lectures Notes, vol. Manuscript
  4. ^ Piyush, Rai (13 September 2011), Support Vector Machines (Contd.), Classification Loss Functions and Regularizers (PDF), Utah CS5350/6350: Machine Learning, retrieved 4 May 2021
  5. ^ Ramanan, Deva (27 February 2008), Lecture 14 (PDF), UCI ICS273A: Machine Learning, retrieved 6 December 2014

and 27 Related for: Loss functions for classification information

Request time (Page generated in 0.875 seconds.)

Loss functions for classification

Last Update:

mathematical optimization, loss functions for classification are computationally feasible loss functions representing the price paid for inaccuracy of predictions...

Word Count : 4159

Loss function

Last Update:

quadratic loss function is common, for example when using least squares techniques. It is often more mathematically tractable than other loss functions because...

Word Count : 2796

Huber loss

Last Update:

than the squared error loss. A variant for classification is also sometimes used. The Huber loss function describes the penalty incurred by an estimation...

Word Count : 1039

Hinge loss

Last Update:

the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector...

Word Count : 995

Mean absolute error

Last Update:

{\displaystyle {\hat {f}}(x)={\text{Median}}(y|X=x)} . Proof: The Loss functions for classification is L = E [ | y − a | | X = x ] = ∫ − ∞ ∞ | y − a | f Y | X...

Word Count : 1074

Statistical learning theory

Last Update:

for the loss function to be convex. Different loss functions are used depending on whether the problem is one of regression or one of classification....

Word Count : 1709

Backpropagation

Last Update:

activation functions at layer l {\displaystyle l} For classification the last layer is usually the logistic function for binary classification, and softmax...

Word Count : 7494

Scoring rule

Last Update:

distributions are predicted. Scoring rules and scoring functions are often used as "cost functions" or "loss functions" of probabilistic forecasting models. They...

Word Count : 5421

Connectionist temporal classification

Last Update:

Connectionist temporal classification (CTC) is a type of neural network output and associated scoring function, for training recurrent neural networks...

Word Count : 367

Ordinal regression

Last Update:

common loss functions from classification (such as the hinge loss and log loss) to the ordinal case. ORCA (Ordinal Regression and Classification Algorithms)...

Word Count : 1301

Outline of machine learning

Last Update:

Waffles Weka Loss function Loss functions for classification Mean squared error (MSE) Mean squared prediction error (MSPE) Taguchi loss function Low-energy...

Word Count : 3580

Tetraplegia

Last Update:

This can lead to loss or impairment of controlling bowel and bladder, sexual function, digestion, breathing and other autonomic functions. Furthermore, sensation...

Word Count : 3198

Linear classifier

Last Update:

the regularization and the loss function. Popular loss functions include the hinge loss (for linear SVMs) and the log loss (for linear logistic regression)...

Word Count : 1180

Triplet loss

Last Update:

Triplet loss is a loss function for machine learning algorithms where a reference input (called anchor) is compared to a matching input (called positive)...

Word Count : 927

Supervised learning

Last Update:

Neural Computation 4, 1–58. G. James (2003) Variance and Bias for General Loss Functions, Machine Learning 51, 115-135. (http://www-bcf.usc.edu/~gareth/research/bv...

Word Count : 3011

Probabilistic classification

Last Update:

classifiers: instead of functions, they are conditional distributions Pr ( Y | X ) {\displaystyle \Pr(Y\vert X)} , meaning that for a given x ∈ X {\displaystyle...

Word Count : 1179

Weight loss

Last Update:

weight loss and manage cardiometabolic health for diabetic people with a 5–15% weight loss. Weight loss in individuals who are overweight or obese can...

Word Count : 5068

Statistical classification

Last Update:

observations by means of a similarity or distance function. An algorithm that implements classification, especially in a concrete implementation, is known...

Word Count : 1969

Support vector machine

Last Update:

loss and these other loss functions is best stated in terms of target functions - the function that minimizes expected risk for a given pair of random...

Word Count : 8914

Mutation

Last Update:

new combinations with new functions. Here, protein domains act as modules, each with a particular and independent function, that can be mixed together...

Word Count : 13908

Classification rule

Last Update:

Bayesian inference Binary classification Decision rule Diagnostic test Gold standard (test) Loss functions for classification Medical test Sensitivity...

Word Count : 2584

Hyperparameter optimization

Last Update:

predefined loss function on given independent data. The objective function takes a tuple of hyperparameters and returns the associated loss. Cross-validation...

Word Count : 2459

Empirical risk minimization

Last Update:

hypothesis is from the true outcome y {\displaystyle y} . For classification tasks these loss functions can be scoring rules. The risk associated with hypothesis...

Word Count : 1626

Taguchi methods

Last Update:

represented by simple monotonic loss functions. In the third case, Taguchi adopted a squared-error loss function for several reasons: It is the first...

Word Count : 2735

Nerve injury classification

Last Update:

Nerve injury classification assists in prognosis and determination of treatment strategy for nerve injuries. Classification was described by Seddon in...

Word Count : 516

Indicator function

Last Update:

brackets Multiset Membership function Simple function Dummy variable (statistics) Statistical classification Zero-one loss function The Greek letter χ appears...

Word Count : 2417

Binary classification

Last Update:

Binary classification is the task of classifying the elements of a set into one of two groups (each called class). Typical binary classification problems...

Word Count : 1399

PDF Search Engine © AllGlobal.net