For reinforcement learning in psychology, see Reinforcement and Operant conditioning.
Part of a series on
Machine learning and data mining
Paradigms
Supervised learning
Unsupervised learning
Online learning
Batch learning
Meta-learning
Semi-supervised learning
Self-supervised learning
Reinforcement learning
Curriculum learning
Rule-based learning
Quantum machine learning
Problems
Classification
Generative modeling
Regression
Clustering
Dimensionality reduction
Density estimation
Anomaly detection
Data cleaning
AutoML
Association rules
Semantic analysis
Structured prediction
Feature engineering
Feature learning
Learning to rank
Grammar induction
Ontology learning
Multimodal learning
Supervised learning (classification • regression)
Apprenticeship learning
Decision trees
Ensembles
Bagging
Boosting
Random forest
k-NN
Linear regression
Naive Bayes
Artificial neural networks
Logistic regression
Perceptron
Relevance vector machine (RVM)
Support vector machine (SVM)
Clustering
BIRCH
CURE
Hierarchical
k-means
Fuzzy
Expectation–maximization (EM)
DBSCAN
OPTICS
Mean shift
Dimensionality reduction
Factor analysis
CCA
ICA
LDA
NMF
PCA
PGD
t-SNE
SDL
Structured prediction
Graphical models
Bayes net
Conditional random field
Hidden Markov
Anomaly detection
RANSAC
k-NN
Local outlier factor
Isolation forest
Artificial neural network
Autoencoder
Cognitive computing
Deep learning
DeepDream
Feedforward neural network
Recurrent neural network
LSTM
GRU
ESN
reservoir computing
Restricted Boltzmann machine
GAN
Diffusion model
SOM
Convolutional neural network
U-Net
Transformer
Vision
Mamba
Spiking neural network
Memtransistor
Electrochemical RAM (ECRAM)
Reinforcement learning
Q-learning
SARSA
Temporal difference (TD)
Multi-agent
Self-play
Learning with humans
Active learning
Crowdsourcing
Human-in-the-loop
RLHF
Model diagnostics
Coefficient of determination
Confusion matrix
Learning curve
ROC curve
Mathematical foundations
Kernel machines
Bias–variance tradeoff
Computational learning theory
Empirical risk minimization
Occam learning
PAC learning
Statistical learning
VC theory
Machine-learning venues
ECML PKDD
NeurIPS
ICML
ICLR
IJCAI
ML
JMLR
Related articles
Glossary of artificial intelligence
List of datasets for machine-learning research
List of datasets in computer vision and image processing
Outline of machine learning
v
t
e
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent ought to take actions in a dynamic environment in order to maximize the cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.
Reinforcement learning differs from supervised learning in not needing labelled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) with the goal of maximizing the long term reward, whose feedback might be incomplete or delayed.[1]
The environment is typically stated in the form of a Markov decision process (MDP), because many reinforcement learning algorithms for this context use dynamic programming techniques.[2] The main difference between the classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathematical model of the Markov decision process and they target large Markov decision processes where exact methods become infeasible.[3]
^Kaelbling, Leslie P.; Littman, Michael L.; Moore, Andrew W. (1996). "Reinforcement Learning: A Survey". Journal of Artificial Intelligence Research. 4: 237–285. arXiv:cs/9605103. doi:10.1613/jair.301. S2CID 1708582. Archived from the original on 2001-11-20.
^van Otterlo, M.; Wiering, M. (2012). "Reinforcement Learning and Markov Decision Processes". Reinforcement Learning. Adaptation, Learning, and Optimization. Vol. 12. pp. 3–42. doi:10.1007/978-3-642-27645-3_1. ISBN 978-3-642-27644-6.
^Li, Shengbo (2023). Reinforcement Learning for Sequential Decision and Optimal Control (First ed.). Springer Verlag, Singapore. pp. 1–460. doi:10.1007/978-981-19-7784-8. ISBN 978-9-811-97783-1. S2CID 257928563.{{cite book}}: CS1 maint: location missing publisher (link)
and 21 Related for: Reinforcement learning information
Reinforcementlearning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent ought to take...
Deep reinforcementlearning (deep RL) is a subfield of machine learning that combines reinforcementlearning (RL) and deep learning. RL considers the problem...
In machine learning, reinforcementlearning from human feedback (RLHF) is a technique to align an intelligent agent to human preferences. In classical...
signals, electrocardiograms, and speech patterns using rudimentary reinforcementlearning. It was repetitively "trained" by a human operator/teacher to recognize...
absence of motor reproduction or direct reinforcement. In addition to the observation of behavior, learning also occurs through the observation of rewards...
model which uses the softmax activation function. In the field of reinforcementlearning, a softmax function can be used to convert values into action probabilities...
OpenAI released a public beta of "OpenAI Gym", its platform for reinforcementlearning research. Nvidia gifted its first DGX-1 supercomputer to OpenAI...
Temporal difference (TD) learning refers to a class of model-free reinforcementlearning methods which learn by bootstrapping from the current estimate...
with reinforcementlearning, such as learning a simplified version of a game first. Some domains have shown success with anti-curriculum learning: training...
Inverse reinforcementlearning (IRL) is the process of deriving a reward function from observed behavior. While ordinary "reinforcementlearning" involves...
stimuli. The frequency or duration of the behavior may increase through reinforcement or decrease through punishment or extinction. Operant conditioning originated...
performance of reinforcementlearning agents in the projective simulation framework. Reinforcementlearning is a branch of machine learning distinct from...
application of MDP process in machine learning theory is called learning automata. This is also one type of reinforcementlearning if the environment is stochastic...
conversational applications using a combination of supervised learning and reinforcementlearning from human feedback. ChatGPT was released as a freely available...
Proximal policy optimization (PPO) is an algorithm in the field of reinforcementlearning that trains a computer agent's decision function to accomplish difficult...
model being used. Adversarial deep reinforcementlearning is an active area of research in reinforcementlearning focusing on vulnerabilities of learned...
In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from...
Starting in 2013, significant progress was made following the deep reinforcementlearning approach, including the development of programs that can learn to...
algorithm) with a learning component (performing either supervised learning, reinforcementlearning, or unsupervised learning). Learning classifier systems...
naturally produces gradient-based primal-dual algorithms in safe reinforcementlearning. Adjustment of observations Duality Gittins index Karush–Kuhn–Tucker...