[Paper Note] A utility criterion for Markov decision processes

less than 1 minute read

Published: March 08, 2024

Jaquette, S. C. (1976). A utility criterion for Markov decision processes. Management Science, 23(1), 43-49.

Utility criterion

The following utility criterion is used $u(X)=-\exp(-\sum\limits_{t}\lambda_{t}x_{t}),$ where $\lambda_{t}= \lambda \beta^{t}$, reflecting time discount and constant risk averse.

The function form is log-additive (pollack, 1967)
It also satisfies risk independence
The formulation can be considered also as a utility function of the total reward, discounted to the current period

Utility optimality

A policy $\pi=(f_1,…,f_t,…)$ is a sequence of vectors. $f_t=\pi_t(s_t)$ maps from states to actions.

A policy $\pi^{*}$ is utility optimal with constant risk aversion $\lambda$ if $u_{\pi^{*}}(\lambda)\geq u_{\pi}(\lambda)$ for all $\pi$.
A policy $\pi=(f_1,…,f_t,…)$ is ultimately stationary if $f_t=f$ for all $t$ larger than some finite integer

Thm. A utility optimal policy $\pi^{*}(\lambda,\beta)$ exists for all $\lambda\geq 0$ and all $0\leq\beta<1$ which is ultimately stationary. The policy $\pi^{*}$ can be chosen as a piecewise constant function of $\lambda$ and $\beta$, with an ultimate action vector $f$ a piecewise constant function of $\beta$ only (and constant for all $\beta$ close to 1).

stationary policies are not utility optimal generally.

Share on

Twitter Facebook LinkedIn

Yifan Hong

[Paper Note] A utility criterion for Markov decision processes

Utility criterion

Utility optimality

Share on

You May Also Enjoy

Universal law of generalization (1)

[Paper Note] Risk-sensitive Markov decision processes

[Paper Note] Additive von Neumann-Morgenstern utility functions

[Paper Note] Linearly parameterized bandits