Lower bound on the log-likelihood of some observed data
Part of a series on
Bayesian statistics
Posterior = Likelihood × Prior ÷ Evidence
Background
Bayesian inference
Bayesian probability
Bayes' theorem
Bernstein–von Mises theorem
Coherence
Cox's theorem
Cromwell's rule
Principle of indifference
Principle of maximum entropy
Model building
Weak prior ... Strong prior
Conjugate prior
Linear regression
Empirical Bayes
Hierarchical model
Posterior approximation
Markov chain Monte Carlo
Laplace's approximation
Integrated nested Laplace approximations
Variational inference
Approximate Bayesian computation
Estimators
Bayesian estimator
Credible interval
Maximum a posteriori estimation
Evidence approximation
Evidence lower bound
Nested sampling
Model evaluation
Bayes factor
Model averaging
Posterior predictive
Mathematics portal
v
t
e
In variational Bayesian methods, the evidence lower bound (often abbreviated ELBO, also sometimes called the variational lower bound[1] or negative variational free energy) is a useful lower bound on the log-likelihood of some observed data.
The ELBO is useful because it provides a guarantee on the worst-case for the log-likelihood of some distribution (e.g. ) which models a set of data. The actual log-likelihood may be higher (indicating an even better fit to the distribution) because the ELBO includes a Kullback-Leibler divergence (KL divergence) term which decreases the ELBO due to an internal part of the model being inaccurate despite good fit of the model overall. Thus improving the ELBO score indicates either improving the likelihood of the model or the fit of a component internal to the model, or both, and the ELBO score makes a good loss function, e.g., for training a deep neural network to improve both the model overall and the internal component. (The internal component is , defined in detail later in this article.)
variational Bayesian methods, the evidencelowerbound (often abbreviated ELBO, also sometimes called the variational lowerbound or negative variational free...
}({z|x})}{p_{\theta }(x,z)}}\right]\end{aligned}}} Now define the evidencelowerbound (ELBO): L θ , ϕ ( x ) := E z ∼ q ϕ ( ⋅ | x ) [ ln p θ ( x , z )...
inference over these variables. To derive a lowerbound for the marginal likelihood (sometimes called the evidence) of the observed data (i.e. the marginal...
probability of the evidence B {\displaystyle B} given that A {\displaystyle A} is true. The likelihood quantifies the extent to which the evidence B {\displaystyle...
the probability of the parameters θ {\displaystyle \theta } given the evidence X {\displaystyle X} , and is denoted p ( θ | X ) {\displaystyle p(\theta...
Monte Carlo Procedures for Generating Points Uniformly Distributed Over Bounded Regions". Operations Research. 32 (6): 1296–1308. doi:10.1287/opre.32.6...
the model itself and is therefore often referred to as model evidence or simply evidence. Due to the integration over the parameter space, the marginal...
consistent structure for hundreds of variables. Learning Bayesian networks with bounded treewidth is necessary to allow exact, tractable inference, since the worst-case...
evidencelowers the probability of the hypothesis then it disconfirms it. Scientists are usually not just interested in whether a piece of evidence supports...
called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability...
updated to a posterior probability in the light of new, relevant data (evidence). The Bayesian interpretation provides a standard set of procedures and...
estimator Credible interval Maximum a posteriori estimation Evidence approximation Evidencelowerbound Nested sampling Model evaluation Bayes factor Model averaging...
criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function...
condition for a sampling distribution to admit sufficient statistics of bounded dimension is that it have the general form of a maximum entropy distribution...
factor is a ratio of two competing statistical models represented by their evidence, and is used to quantify the support for one model over the other. The...
decreasing function of k, the customer's predicted accident rate will often be lower than their observed number of accidents. This shrinkage effect is typical...
For multi-dimensional problems, the highest posterior density region is bounded by a probability density contour line. Credible intervals can also be estimated...
( x ∣ θ ) {\displaystyle F(x\mid \theta )\,\!} and therefore provides evidence about the state of nature θ ∈ Θ {\displaystyle \theta \in \Theta \,\!}...
be viewed as an evidence that support the estimated parameters, this process can be interpreted as "support from independent evidence adds", and the log-likelihood...
Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Fundamentally, Bayesian inference uses...
\sigma )} . The model evidence captures in a single number how well such a model explains the observations. The model evidence of the Bayesian linear...
children in clinical trials Event (probability theory) Event study EvidencelowerboundEvidence under Bayes theorem Evolutionary data mining Ewens's sampling...
Hellenic Vehicle Industry, a Greek vehicle manufacturer Evidencelowerbound, a lowerbound on the log-likelihood in Bayesian statistical inference This...