Prior probability information

A prior probability distribution of an uncertain quantity, often simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable.

In Bayesian statistics, Bayes' rule prescribes how to update the prior with new information to obtain the posterior probability distribution, which is the conditional distribution of the uncertain quantity given new data. Historically, the choice of priors was often constrained to a conjugate family of a given likelihood function, for that it would result in a tractable posterior of the same family. The widespread availability of Markov chain Monte Carlo methods, however, has made this less of a concern.

There are many ways to construct a prior distribution.^[1] In some cases, a prior may be determined from past information, such as previous experiments. A prior can also be elicited from the purely subjective assessment of an experienced expert.^[2]^[3] When no information is available, an uninformative prior may be adopted as justified by the principle of indifference.^[4]^[5] In modern applications, priors are also often chosen for their mechanical properties, such as regularization and feature selection.^[6]^[7]^[8]

The prior distributions of model parameters will often depend on parameters of their own. Uncertainty about these hyperparameters can, in turn, be expressed as hyperprior probability distributions. For example, if one uses a beta distribution to model the distribution of the parameter p of a Bernoulli distribution, then:

p is a parameter of the underlying system (Bernoulli distribution), and
α and β are parameters of the prior distribution (beta distribution); hence hyperparameters.

In principle, priors can be decomposed into many conditional levels of distributions, so-called hierarchical priors.^[9]

^ Robert, Christian (1994). "From Prior Information to Prior Distributions". The Bayesian Choice. New York: Springer. pp. 89–136. ISBN 0-387-94296-3.
^ Chaloner, Kathryn (1996). "Elicitation of Prior Distributions". In Berry, Donald A.; Stangl, Dalene (eds.). Bayesian Biostatistics. New York: Marcel Dekker. pp. 141–156. ISBN 0-8247-9334-X.
^ Mikkola, Petrus; et al. (2023). "Prior Knowledge Elicitation: The Past, Present, and Future". Bayesian Analysis. Forthcoming. doi:10.1214/23-BA1381. hdl:11336/183197. S2CID 244798734.
^ Zellner, Arnold (1971). "Prior Distributions to Represent 'Knowing Little'". An Introduction to Bayesian Inference in Econometrics. New York: John Wiley & Sons. pp. 41–53. ISBN 0-471-98165-6.
^ Price, Harold J.; Manson, Allison R. (2001). "Uninformative priors for Bayes' theorem". AIP Conf. Proc. 617: 379–391. doi:10.1063/1.1477060.
^ Piironen, Juho; Vehtari, Aki (2017). "Sparsity information and regularization in the horseshoe and other shrinkage priors". Electronic Journal of Statistics. 11 (2): 5018–5051. arXiv:1707.01694. doi:10.1214/17-EJS1337SI.
^ Simpson, Daniel; et al. (2017). "Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors". Statistical Science. 32 (1): 1–28. arXiv:1403.4630. doi:10.1214/16-STS576. S2CID 88513041.
^ Fortuin, Vincent (2022). "Priors in Bayesian Deep Learning: A Review". International Statistical Review. 90 (3): 563–591. doi:10.1111/insr.12502. hdl:20.500.11850/547969. S2CID 234681651.
^ Congdon, Peter D. (2020). "Regression Techniques using Hierarchical Priors". Bayesian Hierarchical Models (2nd ed.). Boca Raton: CRC Press. pp. 253–315. ISBN 978-1-03-217715-1.

[1] Robert, Christian (1994). "From Prior Information to Prior Distributions". The Bayesian Choice. New York: Springer. pp. 89–136. ISBN 0-387-94296-3.

[2] Chaloner, Kathryn (1996). "Elicitation of Prior Distributions". In Berry, Donald A.; Stangl, Dalene (eds.). Bayesian Biostatistics. New York: Marcel Dekker. pp. 141–156. ISBN 0-8247-9334-X.

[3] Mikkola, Petrus; et al. (2023). "Prior Knowledge Elicitation: The Past, Present, and Future". Bayesian Analysis. Forthcoming. doi:10.1214/23-BA1381. hdl:11336/183197. S2CID 244798734.

[Zellner1971-4] Zellner, Arnold (1971). "Prior Distributions to Represent 'Knowing Little'". An Introduction to Bayesian Inference in Econometrics. New York: John Wiley & Sons. pp. 41–53. ISBN 0-471-98165-6.

[5] Price, Harold J.; Manson, Allison R. (2001). "Uninformative priors for Bayes' theorem". AIP Conf. Proc. 617: 379–391. doi:10.1063/1.1477060.

[6] Piironen, Juho; Vehtari, Aki (2017). "Sparsity information and regularization in the horseshoe and other shrinkage priors". Electronic Journal of Statistics. 11 (2): 5018–5051. arXiv:1707.01694. doi:10.1214/17-EJS1337SI.

[7] Simpson, Daniel; et al. (2017). "Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors". Statistical Science. 32 (1): 1–28. arXiv:1403.4630. doi:10.1214/16-STS576. S2CID 88513041.

[8] Fortuin, Vincent (2022). "Priors in Bayesian Deep Learning: A Review". International Statistical Review. 90 (3): 563–591. doi:10.1111/insr.12502. hdl:20.500.11850/547969. S2CID 234681651.

[9] Congdon, Peter D. (2020). "Regression Techniques using Hierarchical Priors". Bayesian Hierarchical Models (2nd ed.). Boca Raton: CRC Press. pp. 253–315. ISBN 978-1-03-217715-1.

Prior probability information

and 28 Related for: Prior probability information

Prior probability

Bayesian probability

Posterior probability

Beta distribution

Prior

Conjugate prior

Probability

Conditional probability

Bayesian statistics

Probability interpretations

Principle of maximum entropy

Algorithmic probability

Jeffreys prior

List of statistics articles

Bayesian inference

Binomial distribution

Empirical Bayes method

Frequentist probability

Empirical probability

List of probability topics

Bayesian epistemology

Principle of indifference

Doomsday argument

Classical definition of probability

Checking whether a coin is fair

Naive Bayes classifier

False positives and false negatives

Bayesian linear regression

Bayesian statistics
Part of a series on

Posterior = Likelihood × Prior ÷ Evidence
Background
Bayesian inference Bayesian probability Bayes' theorem Bernstein–von Mises theorem Coherence Cox's theorem Cromwell's rule Principle of indifference Principle of maximum entropy
Model building
Weak prior ... Strong prior Conjugate prior Linear regression Empirical Bayes Hierarchical model
Posterior approximation
Markov chain Monte Carlo Laplace's approximation Integrated nested Laplace approximations Variational inference Approximate Bayesian computation
Estimators
Bayesian estimator Credible interval Maximum a posteriori estimation
Evidence approximation
Evidence lower bound Nested sampling
Model evaluation
Bayes factor Model averaging Posterior predictive
Mathematics portal
v t e