Random forest information

Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned.^[1]^[2] Random decision forests correct for decision trees' habit of overfitting to their training set.^[3]^{: 587–588}

The first algorithm for random decision forests was created in 1995 by Tin Kam Ho^[1] using the random subspace method,^[2] which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg.^[4]^[5]^[6]

An extension of the algorithm was developed by Leo Breiman^[7] and Adele Cutler,^[8] who registered^[9] "Random Forests" as a trademark in 2006 (as of 2019^[update], owned by Minitab, Inc.).^[10] The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho^[1] and later independently by Amit and Geman^[11] in order to construct a collection of decision trees with controlled variance.

^ ^a ^b ^c Ho, Tin Kam (1995). Random Decision Forests (PDF). Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282. Archived from the original (PDF) on 17 April 2016. Retrieved 5 June 2016.
^ ^a ^b Ho TK (1998). "The Random Subspace Method for Constructing Decision Forests" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844. doi:10.1109/34.709601. S2CID 206420153.
^ Cite error: The named reference elemstatlearn was invoked but never defined (see the help page).
^ Kleinberg E (1990). "Stochastic Discrimination" (PDF). Annals of Mathematics and Artificial Intelligence. 1 (1–4): 207–239. CiteSeerX 10.1.1.25.6750. doi:10.1007/BF01531079. S2CID 206795835. Archived from the original (PDF) on 2018-01-18.
^ Kleinberg E (1996). "An Overtraining-Resistant Stochastic Modeling Method for Pattern Recognition". Annals of Statistics. 24 (6): 2319–2349. doi:10.1214/aos/1032181157. MR 1425956.
^ Kleinberg E (2000). "On the Algorithmic Implementation of Stochastic Discrimination" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 22 (5): 473–490. CiteSeerX 10.1.1.33.4131. doi:10.1109/34.857004. S2CID 3563126. Archived from the original (PDF) on 2018-01-18.
^ Breiman L (2001). "Random Forests". Machine Learning. 45 (1): 5–32. Bibcode:2001MachL..45....5B. doi:10.1023/A:1010933404324.
^ Cite error: The named reference rpackage was invoked but never defined (see the help page).
^ U.S. trademark registration number 3185828, registered 2006/12/19.
^ "RANDOM FORESTS Trademark of Health Care Productivity, Inc. - Registration Number 3185828 - Serial Number 78642027 :: Justia Trademarks".
^ Amit Y, Geman D (1997). "Shape quantization and recognition with randomized trees" (PDF). Neural Computation. 9 (7): 1545–1588. CiteSeerX 10.1.1.57.6069. doi:10.1162/neco.1997.9.7.1545. S2CID 12470146. Archived from the original (PDF) on 2018-02-05. Retrieved 2008-04-01.

[ho1995-1] Ho, Tin Kam (1995). Random Decision Forests (PDF). Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995. pp. 278–282. Archived from the original (PDF) on 17 April 2016. Retrieved 5 June 2016.

[ho1998-2] Ho TK (1998). "The Random Subspace Method for Constructing Decision Forests" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844. doi:10.1109/34.709601. S2CID 206420153.

[elemstatlearn-3] Cite error: The named reference elemstatlearn was invoked but never defined (see the help page).

[kleinberg1990-4] Kleinberg E (1990). "Stochastic Discrimination" (PDF). Annals of Mathematics and Artificial Intelligence. 1 (1–4): 207–239. CiteSeerX 10.1.1.25.6750. doi:10.1007/BF01531079. S2CID 206795835. Archived from the original (PDF) on 2018-01-18.

[kleinberg1996-5] Kleinberg E (1996). "An Overtraining-Resistant Stochastic Modeling Method for Pattern Recognition". Annals of Statistics. 24 (6): 2319–2349. doi:10.1214/aos/1032181157. MR 1425956.

[kleinberg2000-6] Kleinberg E (2000). "On the Algorithmic Implementation of Stochastic Discrimination" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 22 (5): 473–490. CiteSeerX 10.1.1.33.4131. doi:10.1109/34.857004. S2CID 3563126. Archived from the original (PDF) on 2018-01-18.

[breiman2001-7] Breiman L (2001). "Random Forests". Machine Learning. 45 (1): 5–32. Bibcode:2001MachL..45....5B. doi:10.1023/A:1010933404324.

[rpackage-8] Cite error: The named reference rpackage was invoked but never defined (see the help page).

[9] U.S. trademark registration number 3185828, registered 2006/12/19.

[10] "RANDOM FORESTS Trademark of Health Care Productivity, Inc. - Registration Number 3185828 - Serial Number 78642027 :: Justia Trademarks".

[amitgeman1997-11] Amit Y, Geman D (1997). "Shape quantization and recognition with randomized trees" (PDF). Neural Computation. 9 (7): 1545–1588. CiteSeerX 10.1.1.57.6069. doi:10.1162/neco.1997.9.7.1545. S2CID 12470146. Archived from the original (PDF) on 2018-02-05. Retrieved 2008-04-01.

Random forest information

and 25 Related for: Random forest information

Random forest

Bootstrap aggregating

Ensemble learning

Decision tree learning

MNIST database

Machine learning in earth sciences

Jackknife variance estimates for random forest

Machine learning in bioinformatics

Decision tree

Survival analysis

Computational biology

Gradient boosting

Random sample consensus

Outline of machine learning

Randomness

Machine learning

Random variable

Fecal immunochemical test

Supervised learning

Random tree

Random graph

JASP

Leo Breiman

Feature selection

PyTorch

Machine learning and data mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Online learning Batch learning Meta-learning Semi-supervised learning Self-supervised learning Reinforcement learning Curriculum learning Rule-based learning Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Artificial neural network Autoencoder Cognitive computing Deep learning DeepDream Feedforward neural network Kolmogorov–Arnold Network Recurrent neural network LSTM GRU ESN reservoir computing Restricted Boltzmann machine GAN Diffusion model SOM Convolutional neural network U-Net Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory
Machine-learning venues ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e