Markov decision process information

In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s;^[1] a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes.^[2] They are used in many disciplines, including robotics, automatic control, economics and manufacturing. The name of MDPs comes from the Russian mathematician Andrey Markov as they are an extension of Markov chains.

At each time step, the process is in some state $s$ , and the decision maker may choose any action $a$ that is available in state $s$ . The process responds at the next time step by randomly moving into a new state $s'$ , and giving the decision maker a corresponding reward $R_{a}(s,s')$ .

The probability that the process moves into its new state $s'$ is influenced by the chosen action. Specifically, it is given by the state transition function $P_{a}(s,s')$ . Thus, the next state $s'$ depends on the current state $s$ and the decision maker's action $a$ . But given $s$ and $a$ , it is conditionally independent of all previous states and actions; in other words, the state transitions of an MDP satisfy the Markov property.

Markov decision processes are an extension of Markov chains; the difference is the addition of actions (allowing choice) and rewards (giving motivation). Conversely, if only one action exists for each state (e.g. "wait") and all rewards are the same (e.g. "zero"), a Markov decision process reduces to a Markov chain.

^ Bellman, R. (1957). "A Markovian Decision Process". Journal of Mathematics and Mechanics. 6 (5): 679–684. JSTOR 24900506.
^ Howard, Ronald A. (1960). Dynamic Programming and Markov Processes. The M.I.T. Press.

[1] Bellman, R. (1957). "A Markovian Decision Process". Journal of Mathematics and Mechanics. 6 (5): 679–684. JSTOR 24900506.

[2] Howard, Ronald A. (1960). Dynamic Programming and Markov Processes. The M.I.T. Press.

Markov decision process information

and 22 Related for: Markov decision process information

Markov decision process

Partially observable Markov decision process

Markov chain

Markov model

Decentralized partially observable Markov decision process

Reinforcement learning

Markov property

Andrey Markov

List of things named after Andrey Markov

Artificial intelligence

Machine learning

Bellman equation

Markov reward model

Exponential backoff

Learning automaton

Planning Domain Definition Language

List of statistics articles

Cyrus Derman

Stochastic process

Stochastic game

Abdominal aortic aneurysm

Temporal difference learning