<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://yfflood.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://yfflood.github.io/" rel="alternate" type="text/html" /><updated>2026-01-31T21:21:19-08:00</updated><id>https://yfflood.github.io/feed.xml</id><title type="html">Yifan Hong</title><subtitle>graduate student at Tsinghua University</subtitle><author><name>Yifan Hong</name><email>hongyf23@mails.tsinghua.edu.cn</email></author><entry><title type="html">Universal law of generalization (1)</title><link href="https://yfflood.github.io/posts/2024/04/02/shepard1987sci/" rel="alternate" type="text/html" title="Universal law of generalization (1)" /><published>2024-04-02T00:00:00-07:00</published><updated>2024-04-02T00:00:00-07:00</updated><id>https://yfflood.github.io/posts/2024/04/02/shepard1987sci</id><content type="html" xml:base="https://yfflood.github.io/posts/2024/04/02/shepard1987sci/"><![CDATA[<p>@YifanHong 20230524</p>
<blockquote>
  <p>Roger N. Shepard ,Toward a Universal Law of Generalization for Psychological Science.Science237,1317-1323(1987).DOI:10.1126/science.3629243</p>
</blockquote>

<p>Starting from the seminal paper of Shepard (1987),  I took a first glimpse of psychology as a quantitative science (like physics, which i love so much). The universality in the seemingly unpredictable, noisy behaviors is amazing.</p>

<h2 id="primacy-of-generalization">Primacy of Generalization</h2>
<p>Anything experienced by an individual is <strong>unlikely to recur in exactly the same form and context</strong></p>
<ul>
  <li>To make learning useful, one has to <em>generalize</em>.</li>
  <li>Each individual has an internal metric of similarity (exists at birth)</li>
</ul>

<h3 id="generalization-vs-failure-of-discrimination">Generalization vs. failure of discrimination</h3>
<p>Encountering an unfamiliar object requires the individual to infer the <strong>consequence</strong></p>
<ul>
  <li>Generalization is a cognitive <strong>act</strong></li>
  <li>psychological vs. psychophysical</li>
</ul>

<h3 id="early-work">Early work</h3>

<blockquote>
  <p>when Pavlov found that dogs would salivate not only at the sound of a bell or whistle that had preceded feeding but also at other sounds-and more so as they were chosen to be more similar to the original sound, for example, in pitch.</p>
</blockquote>

<p><strong>Empirical</strong> gradient of generalization: relate feature of response to some measure of difference</p>
<ul>
  <li>strength, probability, speed etc.</li>
</ul>

<p>Example. identification learning (Shepard, 1980)</p>
<ul>
  <li>subjects learn a one-to-one association between $n$ stimuli and $n$ arbitrary verbal responses</li>
  <li><strong>measure of generalization</strong> $g_{ij}$: the frequency with which any stimulus lead to response assigned to any other</li>
</ul>

<h3 id="apparent-noninvariance">Apparent noninvariance</h3>
<p>To establish quantitative results, we have to choose a <em>independent variable</em></p>
<ul>
  <li>choosing <strong>physical measures</strong> of difference does not guarantee <strong>invariance</strong></li>
  <li>the decrease in response has different patterns with various stimuli, sesory continuum, or species</li>
  <li>even exhibit nonmonotone increase: tones separated by an octave, hues at the opposite ends of the visible spectrum</li>
</ul>

<p><img src="/images/Newton's_color_circle.png" alt="Newton's_color_circle" height="30%" width="30%" /></p>

<p>Newton’s_color_circle (Newton, 1704).</p>

<h2 id="establishing-invariance-in-the-psychological-space">Establishing invariance in the psychological space</h2>
<blockquote>
  <p><em>What is sometimes required is not more data or more refined data but a different <strong>conception</strong> of the problem.</em></p>
</blockquote>

<p>Assume the invariant law of generalization is based on an appropriate <strong>psychological space</strong>
Attribute the troublesome variations in the generalization gradients to variations in a <em>psychophysical function</em> (physical space $\rightarrow$ psychological space)
A purely <em>psychological function</em> relates generalization to <strong>distance</strong> in the psychological space.</p>

<blockquote>
  <p>Is there an <strong>invariant monotonic</strong> function whose <strong>inverse</strong> will uniquely transform those <strong>data</strong> into numbers interpretable as distances in some appropriate metric space?</p>
</blockquote>

<p>Goal: find a metric space (or, the inverse function) that explains the observed generalization data(Satisfying certain conditions like certain Cayley-Menger determinants https://en.wikipedia.org/wiki/Cayley%E2%80%93Menger_determinant)</p>

<ul>
  <li>Uniqueness implied by geometric fact: <em>rank order</em> of distance can be a good approximation to the distances, when the number of points is not too small (relative to dimensionality)</li>
  <li>The function can be determined through <strong>nonmetric multidimensional scaling</strong>
    <ul>
      <li>move $n$ points in a specified space, until the configuration <strong>minimizes</strong> some measure of departure (“stress”) from a monotonic relation between $g_{ij}$ and $d_{ij}$.  \(g_{ij}=\bigg(\frac{p_{ij}\cdot p_{ji}}{p_{ii}\cdot p_{jj}}\bigg)^{1/2}\)</li>
    </ul>
  </li>
</ul>

<h2 id="empirical-regularities">Empirical regularities</h2>
<h3 id="invariance-and-approximate-exponential-decay">Invariance and (approximate) exponential decay</h3>
<blockquote>
  <p><strong>Invariance:</strong>  in every case, the decrease of generalization with psychological distance is</p>
  <ol>
    <li>monotonic,</li>
    <li>generally concave upward, and</li>
    <li>more or less approximates a simple exponential decay function</li>
  </ol>
</blockquote>

<p>MDS does not impose the form of function except for monotonicity.</p>

<p><img src="/images/fig1_20240402.png" alt="fig1" class="center-image" /></p>

<p>Gradients of generalization (Shepard, 1987).</p>

<h2 id="two-distance-metrics">Two distance metrics</h2>
<p>When the psychological space has more than 1 dimension, data also provide evidence about the metric for unitary and analyzable stimuli</p>
<ul>
  <li>integral attributes (lightness and saturation) : Euclidean metric</li>
  <li>separable attributes (size and orientation): ‘city-block’ metric
They can be expressed as a Minkowski power metric
\(d_{ij}=\bigg(\sum_{k=1}^K|x_{ik}-x_{jk}|^r\bigg)^{1/r}\)</li>
</ul>

<h2 id="a-theory-of-generalization">A <em>theory</em> of generalization</h2>

<blockquote>
  <p>Generalization is thus a cognitive act, not merely a failure of sensory discrimination.</p>
</blockquote>

<p>Recognition as member of a “Natural kind”, corresponding to some <strong><em>consequential region</em></strong> $C$ in the psychological space.</p>

<p><strong>Assumption</strong>: nature chose the consequential region at random</p>
<ol>
  <li>all <em>locations</em> are equally probable</li>
  <li><em>size</em> has density $p(s)$ with finite mean $\mu$
 \(\int_0^\infty p(s)ds =1\,E[s]=\int_0^\infty s\cdot p(s)ds = \mu&lt;\infty\)</li>
  <li>arbitrary <em>shape</em> that is centrally symmetric and has finite extension</li>
</ol>

<p><strong>Goal</strong>: estimate the conditional probability $P\big(x\in C\vert 0\in C)$</p>

<p><img src="/images/fig2_20240402.png" alt="2" class="center-image" /></p>

<p>Consequential regions with fixed size (Shepard, 1987).</p>

<p>Given any size $s$, the probability of covering $x$ is the ratio of the volunetric measure.
\(p(x\in C\vert s)=\frac{m(s,x)}{m(s)}\)</p>

<p>Marginalize out <em>size</em> to derive the conditional probability, denoted as $g(\cdot)$. Note that $p(\cdot)$ satisfies some conditions.
\(g(x)=\int_0^\infty p(s)\frac{m(s,x)}{m(s)}ds\)</p>

<h3 id="exponential-law">Exponential Law</h3>
<h4 id="derivation-with-unidimensional-case">Derivation with unidimensional case</h4>
<p>A convex consequential region is an interval of a certain length
\(m(s)=s\)
\(m(s,x)=\left\{\begin{array}{ll} s-\vert x\vert,&amp; s\geq \vert x\vert \\ 0, &amp; s&lt;\vert x\vert \end{array}\right.\)</p>

<p>Taking these into the previous expression, and taking derivatives lead to
\(g''(d)=\frac{p(d)}{d}\)</p>

<p><strong>Some simulation results with different size distribution</strong></p>

<p><img src="/images/fig3_20240402.png" alt="3" class="center-image" /></p>

<p>Generalization function and corresponding prior (Shepard, 1987).</p>

<ul>
  <li>dependence on the prior is weak</li>
  <li>exponential law is <strong>exact</strong> under Erlang prior $p(s)=(\frac{2}{\mu})^{2}s\cdot \exp(\frac{-2}{\mu}s)$
 \(g(d) = \exp\bigg(-2\frac{d}{\mu}\bigg)\)</li>
  <li>Erlang prior can be derived using Baysian update of prior assuming minimum knowledge (the first stimulus falls in the consequential region with probability proportional to its volumn $m(s)$)
\(p(s)=C\cdot m(s) \cdot q(s)\)
    <h3 id="derivation-of-two-metrics">Derivation of two metrics</h3>
    <p>For multidimensional cases, different metrics can be considered as a result of different dependence relations between dimensions.</p>
  </li>
  <li>For dimensions identified as independent variables in the world, extension of consequential region along these dimensions should not be correlated</li>
</ul>

<p><img src="/images/fig4_20240402.png" alt="4" class="center-image" /></p>

<p>Equal generalization contours (Shepard, 1987).</p>

<h2 id="limitations">Limitations</h2>

<ul>
  <li>When tested on highly similar stimuli, or in delayed test, “noise” will come into effect
    <ol>
      <li>exponential decay $\rightarrow$ Gaussian (Shepard, 1987: fig.1 L)</li>
      <li>rhombic $\rightarrow$ elliptical curve of equal generalization for <em>analyzable</em> dimensions</li>
    </ol>
  </li>
  <li>Effect of category learning; asymmetries of generalization</li>
  <li>Alternative explanations based on graded generalization form of consequential region</li>
  <li>Negative correlation of dimensions lead to $r&lt;1$</li>
  <li>Time to discriminate <strong>reciprocally</strong> related to distance, not exponentially</li>
</ul>]]></content><author><name>Yifan Hong</name><email>hongyf23@mails.tsinghua.edu.cn</email></author><category term="generalization" /><summary type="html"><![CDATA[@YifanHong 20230524 Roger N. Shepard ,Toward a Universal Law of Generalization for Psychological Science.Science237,1317-1323(1987).DOI:10.1126/science.3629243]]></summary></entry><entry><title type="html">[Paper Note] Risk-sensitive Markov decision processes</title><link href="https://yfflood.github.io/posts/2024/03/08/howard1972ms/" rel="alternate" type="text/html" title="[Paper Note] Risk-sensitive Markov decision processes" /><published>2024-03-08T00:00:00-08:00</published><updated>2024-03-08T00:00:00-08:00</updated><id>https://yfflood.github.io/posts/2024/03/08/howard1972ms</id><content type="html" xml:base="https://yfflood.github.io/posts/2024/03/08/howard1972ms/"><![CDATA[<blockquote>
  <p>Howard, R. A., &amp; Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Management science, 18(7), 356-369.</p>
</blockquote>

<p>This paper provides formulation and algorithm for the maximization of <em>certain equivalent reward</em> (<em>utility</em>) generated by a MDP.</p>
<ul>
  <li>The eigenvalue of $Q$, $\lambda$, may (?) be used to elicitated risk attitude $\gamma$</li>
</ul>

<h1 id="definitions-and-notations">Definitions and notations</h1>
<p>$\tilde{v}$ is the <em>certain equivalent</em> of a lottery with outcome $v$, $u(v)=u(\tilde{v})$</p>

<p>Denote $u_i(n)=u(\tilde{v}_{i}(n))$ be the utility of the reward process when it occupies state $i$ and there are $n$ periods left.</p>

<h2 id="assumption">Assumption</h2>
<p>Assume <strong><em>delta property</em></strong>, that when all prizes in a lottery increases by the same amount, certain equivalent increases by the same amount.</p>
<ul>
  <li>To ensure this property, the utility function has to be either <em>linear</em> or <em>exponential</em></li>
</ul>

\[u(v)=-(sgn\gamma)e^{-\gamma v}\]

\[u^{-1}(x)=-\frac{1}{\gamma}\ln(-(sgn\gamma)x)\]

<p>The exponential utility implies a constant increase in lottery will become a multiplicative term
\(u(v+\Delta)=e^{-\gamma \Delta} u(v)\)</p>

<h1 id="formulation">Formulation</h1>
<h2 id="reward-process">Reward process</h2>

\[\begin{equation}
\begin{split}
u(\tilde{v}_i(n+1))&amp;=\sum\limits_{j\in S}p_{ij}(n+1) u(r_{ij}(n+1)+\tilde{v}_{j}(n)) \\
&amp;= \sum\limits_{j\in S}p_{ij}(n+1)e^{-\gamma r_{ij}(n+1)} u(\tilde{v}_{j}(n))
\end{split}
\end{equation}\]

<h3 id="stationary-mrp">Stationary MRP</h3>
<p>Equivalently, we can use matrix notation, $Q_{ij}=p_{ij}e^{-\gamma r_{ij}}$</p>

<p>\(\text{u}(n)=Q^{n}\text{u}(0)\)
We can show that, 
\(\lim_{n\rightarrow \infty}[\tilde{v}_{i}(n)-n(- \frac{1}{\gamma} \ln\lambda)]=\tilde{v}_{i}+c\)</p>
<ul>
  <li>$\lambda$ is the first eigen value of $Q$</li>
  <li>$\tilde{v}_{i}$ is the <em>relative certain equivalent</em></li>
  <li>$\tilde{g}=- \frac{1}{\gamma} \ln\lambda$ is the <em>certain equivalent gain</em></li>
</ul>

<p>We can re-write the consistency condition</p>

\[\lambda u_{i}= \sum\limits_{j\in S} q_{ij}u_{j}\]

<h2 id="decision-process">Decision process</h2>

<p>\(u_{i}(n+1)=\max_{k}\sum\limits_{j} p_{ij}^{k}(n+1)e^{-\gamma r^{k}_{ij}(n)} u_j(n)\)</p>
<ul>
  <li>this equation allows us to compute the <em>optimal policy</em> and the optimal <em>utility</em></li>
  <li>We can also find $\tilde{v}_{i}(n)$, the <em>certain equivalent</em> of the lottery implied by being in state $i$ with $n$ stages left</li>
</ul>

<h1 id="algorithm-for-stationary-policy">Algorithm for <em>stationary</em> policy</h1>
<p><strong>policy evaluation</strong>: for the present policy, solve</p>

\[e^{-\gamma (\tilde{g}+\tilde{v}_{i})} = \sum\limits_{j\in S} p_{ij} \cdot e^{-\gamma (r_{ij}+\tilde{v}_j)}\]

<p>with $\tilde{v}_{N}=0$, for certain equivalent gain $\tilde{g}$ and the relative certain equivalents.</p>

<p><strong>policy improvement</strong>: For each state $i$ find the alternative $k$ that maximizes the certain equivalent</p>

\[\tilde{V}_{i}^{k}=- \frac{1}{\gamma}\ln [\sum\limits_{j} p_{ij}^{k}e^{-\gamma (r_{ij}^{k} +\tilde{v}_{j})}]\]]]></content><author><name>Yifan Hong</name><email>hongyf23@mails.tsinghua.edu.cn</email></author><category term="MDP" /><summary type="html"><![CDATA[Howard, R. A., &amp; Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Management science, 18(7), 356-369.]]></summary></entry><entry><title type="html">[Paper Note] A utility criterion for Markov decision processes</title><link href="https://yfflood.github.io/posts/2024/03/08/jaquette1976ms/" rel="alternate" type="text/html" title="[Paper Note] A utility criterion for Markov decision processes" /><published>2024-03-08T00:00:00-08:00</published><updated>2024-03-08T00:00:00-08:00</updated><id>https://yfflood.github.io/posts/2024/03/08/jaquette1976ms</id><content type="html" xml:base="https://yfflood.github.io/posts/2024/03/08/jaquette1976ms/"><![CDATA[<blockquote>
  <p>Jaquette, S. C. (1976). A utility criterion for Markov decision processes. Management Science, 23(1), 43-49.</p>
</blockquote>

<h1 id="utility-criterion">Utility criterion</h1>
<p>The following utility criterion is used
\(u(X)=-\exp(-\sum\limits_{t}\lambda_{t}x_{t}),\)
where $\lambda_{t}= \lambda \beta^{t}$, reflecting <em>time discount</em> and <em>constant risk averse</em>.</p>
<ul>
  <li>The function form is log-additive (<a href="/posts/2024/03/07/pollack1967eca/">pollack, 1967</a>)</li>
  <li>It also satisfies risk independence</li>
  <li>The formulation can be considered also as a utility function of the total reward, discounted to the current period</li>
</ul>

<h1 id="utility-optimality">Utility optimality</h1>
<p>A policy $\pi=(f_1,…,f_t,…)$ is a sequence of vectors. $f_t=\pi_t(s_t)$ maps from states to actions.</p>
<ul>
  <li>A policy $\pi^{*}$ is <em>utility optimal</em> with constant risk aversion $\lambda$ if $u_{\pi^{*}}(\lambda)\geq u_{\pi}(\lambda)$ for all $\pi$.</li>
  <li>A policy $\pi=(f_1,…,f_t,…)$ is <em>ultimately stationary</em> if $f_t=f$ for all $t$ larger than some finite integer</li>
</ul>

<p><strong>Thm.</strong> A <em>utility optimal</em> policy $\pi^{*}(\lambda,\beta)$ exists for all $\lambda\geq 0$ and all $0\leq\beta&lt;1$ which is <em>ultimately stationary</em>. The policy $\pi^{*}$ can be chosen as a <strong>piecewise constant</strong> function of $\lambda$ and $\beta$, with an ultimate action vector $f$ a piecewise constant function of $\beta$ only (and constant for all $\beta$ close to 1).</p>
<ul>
  <li><strong>stationary</strong> policies are not utility optimal generally.</li>
</ul>]]></content><author><name>Yifan Hong</name><email>hongyf23@mails.tsinghua.edu.cn</email></author><category term="MDP" /><summary type="html"><![CDATA[Jaquette, S. C. (1976). A utility criterion for Markov decision processes. Management Science, 23(1), 43-49.]]></summary></entry><entry><title type="html">[Paper Note] Linearly parameterized bandits</title><link href="https://yfflood.github.io/posts/2024/03/07/rusmevichientong2010moor/" rel="alternate" type="text/html" title="[Paper Note] Linearly parameterized bandits" /><published>2024-03-07T00:00:00-08:00</published><updated>2024-03-07T00:00:00-08:00</updated><id>https://yfflood.github.io/posts/2024/03/07/Rusmevichientong2010moor</id><content type="html" xml:base="https://yfflood.github.io/posts/2024/03/07/rusmevichientong2010moor/"><![CDATA[<blockquote>
  <p>Rusmevichientong, P., &amp; Tsitsiklis, J. N. (2010). Linearly parameterized bandits. Mathematics of Operations Research, 35(2), 395-411.</p>
</blockquote>

<p>The key ideas for finding bounds is to</p>
<ul>
  <li>attribute <em>regret</em> to the causes: <em>exploration</em> or <em>estimation error</em></li>
  <li>find the relation between these two parts, if any</li>
</ul>

<h1 id="model-formulation">Model formulation</h1>
<p>$\mathcal{U}_r \subset \mathbb{R}^r$ a <em>compact</em> set if arms ($r\geq2$), The reward in period $t$ is given by</p>

<p>\(X_{t}^{u}=u^{T}Z + W_{t}^{u},\tag{1}\)</p>
<ul>
  <li>$Z\in\mathbb{R}^r$ is a random variable</li>
  <li>$W_{t}^{u}$ are iid mean-zero random variables</li>
</ul>

<p>A <em>policy</em> $\psi=(\psi_{1},\psi_{2},…)$  is a sequence of functions, each mapping from <em>history</em> to $\mathcal{U}_r$. For any policy $\psi$ and $z\in\mathbb{R}^r$, the <em>cumulative regret</em> under $\psi$ given $Z=z$ is</p>

\[\text{Regret}(z,T,\psi)=\sum\limits_{t=1}^{T}E\bigg[\max_{v\in\mathcal{U}_{r}}v^Tz - U^{T}_{t}z\,\vert\,Z=z\bigg],\]

<p>The <em>cumulative Bayes risk</em> under $\psi$ is the expectation w.r.t. the <em>prior</em> of $Z$.</p>

\[\text{Risk}(T,\psi)=E[\text{Regret}(Z,T,\psi)]\]

<p><img src="/images/Pasted image 20240307221236.png" alt="regret table" /></p>

<h1 id="lower-bounds">Lower bounds</h1>
<p>Linear bandits has $\Omega(r\sqrt{T})$ lower bounds on the Bayes risk and thus on regret, given normal prior on $Z$.</p>
<ol>
  <li>
    <p>The cumulative risk can be lower-bounded by the <strong>estimator error variance</strong> and the <strong>total amount of exploration</strong>.</p>

    <p><strong>Lemma</strong> (risk decomposition) Let $S_{t}^{1},…, S_{t}^{r-1}$ denote a collection of orthogonal unit vectors that are also orthogonal to $\hat{Z}$. For any $T\geq 1$, 
 \(\text{Risk}(T,\psi)\geq \frac{1}{2}\sum\limits_{k=1}^{T}E \bigg[||Z||\sum\limits_{t=1}^{T}(T_{t}^{T}S_{t}^{k})^{2} + \frac{T}{||Z||} \{(Z-\hat{Z}_{T})^{T}S_{T}^{k}\}^2\bigg]\)</p>
  </li>
  <li>
    <p>The two terms are interelated. Little exploration implies large estimation error</p>

    <p><strong>Lemma</strong> For any $k$ and $T\geq 1$,
 \(E[\{(Z-\hat{Z})^{T}S_{T}^{k}\}^{2}\vert H_{T}]\geq\frac{1}{r+\sum\limits_{t=1}^{T}(T_{t}^{T}S_{t}^{k})^{2} },\)
 where $r$ is prior precision of $Z$.</p>
  </li>
  <li>There is a lower bound on the probability that $Z$ is bounded away from 0.</li>
  <li>Then we can derive a <em>minimum directional risk</em>.</li>
</ol>

<h1 id="upper-bounds-pege-algorithm">Upper bounds: PEGE algorithm</h1>

<h2 id="algorithm-phased-exploration-and-exploitation">Algorithm: phased-exploration-and-exploitation</h2>
<p>There are two phases in each cycle $c\geq 1$:</p>
<ol>
  <li>Exploration ($r$ periods) play arm $b_k$. The explored arms span the entire space.
 compute the OLS estimate $\hat{Z}(c)=Z+\frac{1}{c}(\sum\limits_{k=1}^{r}b_{k}b_{k} ^{T})^{-1}\sum\limits_{s=1}^{c}\sum\limits_{k=1}^{r}b_{k}W^{b_{k}}(s)$</li>
  <li>Exploitation ($c$ periods) play the greedy arm $G(c)=\arg\max_{v\in\mathcal{U}_{r}} v^{T}\hat{Z}(c)$.
In the algorithm, $c$ is of the <strong>order $O(\sqrt{T})$</strong>.</li>
</ol>

<h2 id="assumptions">Assumptions</h2>
<ol>
  <li>Subgaussian noise</li>
  <li>Arms are bounded, and include $r$ linearlly independent components</li>
  <li>$\mathcal{U}_{r}$ satisfy <em>smooth best arm response with parameter J</em>
     \(||u^{*}(z)-u^{*}(y)||\leq J||z-y||\)</li>
</ol>

<h2 id="upper-bound">Upper bound</h2>
<p><strong>For PEGE, we can explicitly disentangle risk caused by <em>exploration</em> and <em>misspecification</em></strong></p>

<p><strong>Thm.</strong> There exists a positive constant $a_{1}$ that depends only on the noise bounds, arm bounds and response bounds, such that for any $z$ and $T\geq r$,
\(\text{Regret}(z,T,\text{PEGE})\leq a_{1}(||z||+\frac{1}{||z||})r\sqrt{T}\)</p>
<ul>
  <li>Since the arm bound provides a trivial bound $2\bar{u}\vert\vert z\vert\vert$ on <em>instantaneous regret</em>, the bound does not deteriorate as $\vert\vert z\vert\vert$ approaches 0.</li>
</ul>

<p>Proof sketch:</p>
<ol>
  <li>There is an upper bound on the squared norm error
     \(E[||\hat{Z}(c)-z||^{2}| Z=z]\leq \frac{h_{1}r}{c}\)</li>
  <li>Expected <strong>instantaneous regret</strong> under greedy decision is of order $O(\vert\vert Z-\hat{Z}\vert\vert^{2})$ given <strong>smoothness</strong> assumption.</li>
  <li>Over total $K$ cycles, $K=O(\sqrt{T})$
     \(\text{Regret}\left(z,rK+\sum\limits_{c=1}^{K}c,\text{PEGE}\right)\leq h_{3}r||z||K+h_{4}\sum\limits_{c=1}^{K} \frac{r}{||z||}\)</li>
</ol>]]></content><author><name>Yifan Hong</name><email>hongyf23@mails.tsinghua.edu.cn</email></author><category term="bandits" /><summary type="html"><![CDATA[Rusmevichientong, P., &amp; Tsitsiklis, J. N. (2010). Linearly parameterized bandits. Mathematics of Operations Research, 35(2), 395-411.]]></summary></entry><entry><title type="html">[Paper Note] Additive von Neumann-Morgenstern utility functions</title><link href="https://yfflood.github.io/posts/2024/03/07/pollack1967eca/" rel="alternate" type="text/html" title="[Paper Note] Additive von Neumann-Morgenstern utility functions" /><published>2024-03-07T00:00:00-08:00</published><updated>2024-03-07T00:00:00-08:00</updated><id>https://yfflood.github.io/posts/2024/03/07/pollack1967eca</id><content type="html" xml:base="https://yfflood.github.io/posts/2024/03/07/pollack1967eca/"><![CDATA[<blockquote>
  <p>Pollak, Robert A. “Additive von Neumann-Morgenstern utility functions.” <em>Econometrica</em> 35.3/4 (1967): 485-494.</p>
</blockquote>

<p>Axiomatically <strong>characterize</strong> the class of (log-)additive <em>utility functions</em> with axioms on <em>preferences</em>.</p>
<h1 id="definitions-and-notations">Definitions and notations</h1>
<h2 id="utility-function-forms">Utility function forms</h2>
<p>A vNM utility functionis <em>ordinally</em> <em>additive</em> if there exist $T$ functions $v^t(x_t)$ and a twice differentiable function $F$, $F’&gt;0$, such that $F[V(X)]=\sum\limits_{t=1}^{T} v^t(x_t)$.</p>
<ul>
  <li>A log-additive utility function is ordinally additive.</li>
</ul>

<h2 id="alternatives-lottery-tickets">Alternatives: lottery tickets</h2>
<p>Let $X_a,Y_a$ be $T$-dimensional vector representing <em>consumption paths</em>:$X_a=(x_{a1},…,x_{aT}), Y_a=(y_{a1},…,y_{aT})$, and let $\gamma_a\in[0,1]$ be a number. 
A <em>simple lottery ticket</em> $L_a=(\gamma_a,X_a,Y_a)$ is an alternative that, once chosen, yields $X_a$ with probability $\gamma_a$ and $Y_a$ with probability $1-\gamma_a$.
\(V(L_{a}) = \gamma_{a} V(X_{a}) + (1-\gamma_{a}) V(Y_a)\)</p>
<ul>
  <li>Two simple lottery tickets $L_a,L_b$ are a pair of <em><strong>k-standard lottery tickets</strong> $(L_a,L_b)$</em> if (1) $\gamma_a=\gamma_b=\frac{1}{2}$, and (2) have a common value on the first consumption path $x_{ak}=x_{bk}=x_k$.</li>
  <li>Two simple lottery tickets $L_a,L_b$ are a pair of <em><strong>k-normal lottery tickets</strong> $\langle L_a,L_b\rangle$</em> if (1) $\gamma_a=\gamma_b=\frac{1}{2}$, and (2) have a common value on both consumption paths $x_{ak}=x_{bk}=y_{ak}=y_{bk}=z_k$.</li>
</ul>

<h1 id="characterization-theorems">Characterization theorems</h1>
<p><strong>Strong additivity axiom</strong>: an individual’s preference between two <em>k-standard lottery tickets</em> in a given pair is independent of the level of $x_k$ for all pairs of k-standard lotteries, and all choice of $k$.</p>
<ul>
  <li>If $V(L_a(x_k))&gt;V(L_b(x_k))$ for some $x_k$, then $V(L_a(x_k’))&gt;V(L_b(x_k’))$ for all $x_k’$.</li>
</ul>

<blockquote>
  <p>Thm 1. An individual’s <em>preferences</em> satisfy the <em>strong additive axiom</em> if and only if his von Neumann-Morgenstern utility function is <strong>additive</strong>.</p>
  <ul>
    <li><em>proof sketch</em>: construct two consumption paths, and differentiate the utility functions with repect to $k$.This gives an equation showing that $\partial V(X) / \partial x_k$ depends only on $x_k$. The necessity part is trivial.</li>
  </ul>
</blockquote>

<hr />

<p><strong>Weak additivity axiom</strong>: an individual’s preference between two <em>k-normal lottery tickets</em> in a given pair is independent of the level of $z_k$ for all pairs of k-normal lotteries, and all choice of $k$.</p>
<ul>
  <li>If $V(L_a(z_k))&gt;V(L_b(z_k))$ for some $z_k$, then $V(L_a(z_k’))&gt;V(L_b(z_k’))$ for all $z_k’$.</li>
  <li>Weak additivity axiom is weaker in the sense that, it restricts preference on <em>k-normal lotteries</em>, which is a <strong>subset</strong> of <em>k-standard lotteries</em>.</li>
</ul>

<blockquote>
  <p>Thm 2. An individual’s <em>preferences</em> satisfy the <em>weak additive axiom</em> if and only if his von Neumann-Morgenstern utility function is <strong>additive</strong> or <strong>log-additive</strong>.</p>
  <ul>
    <li>proof sketch: to show the suficient condition, a lemma points out (by construction) that weak additivity implies ordinally additive property. By differentiating two indifferent consumption paths with respect to $x_k$, we can show that $H’(V)=G’‘(S)/G’(S)=c$ is a constant. This shows that $G$ must be a linear or exponential transformation.</li>
  </ul>
</blockquote>

<h2 id="implications-for-sequential-decision-making">Implications for sequential decision making</h2>
<p>Consider the preference satisfying weak additivity axiom:
\(V(x_1,...,x_T)=G(\sum\limits_{t=1}^{T}v^t(x_t)),\)
where $G$ is a linear or exponential transformation.</p>
<ul>
  <li>A single choice is made between different <em>streams</em> of income</li>
  <li><strong>Q</strong>: why should the decision maker considers reward in each period <strong>independently</strong>.</li>
</ul>]]></content><author><name>Yifan Hong</name><email>hongyf23@mails.tsinghua.edu.cn</email></author><category term="utility-theory" /><summary type="html"><![CDATA[Pollak, Robert A. “Additive von Neumann-Morgenstern utility functions.” Econometrica 35.3/4 (1967): 485-494.]]></summary></entry><entry><title type="html">[Paper Note] Risk aversion in the small and in the large</title><link href="https://yfflood.github.io/posts/2024/03/06/pratt64eca/" rel="alternate" type="text/html" title="[Paper Note] Risk aversion in the small and in the large" /><published>2024-03-06T00:00:00-08:00</published><updated>2024-03-06T00:00:00-08:00</updated><id>https://yfflood.github.io/posts/2024/03/06/pratt64eca</id><content type="html" xml:base="https://yfflood.github.io/posts/2024/03/06/pratt64eca/"><![CDATA[<blockquote>
  <p>Pratt, John W. “Risk Aversion in the Small and in the Large.” <em>Econometrica</em> 32.1/2 (1964): 122-136.</p>
</blockquote>

<p>For the utility functions for money, a <strong><em>local</em></strong> <em>risk measure</em> is defined.  (equation (3))</p>
<ul>
  <li>The <em>risk premium</em> can be <strong>locally</strong> represented using the <em>local risk aversion</em>. (equation (2))</li>
  <li>This measure is closely related to <strong><em>global</em></strong> risk-averse preferences.</li>
</ul>

<h1 id="definitions-and-notations">Definitions and notations</h1>
<ul>
  <li>$u_{1}(x)\sim u_{2}(x)$: two functions are equivalent as utilities (up to increasing linear transformation)</li>
  <li>$\pi({x,\tilde{z}})$: the risk premium (elaborated below)</li>
</ul>

<h2 id="the-risk-premium-pi">The <strong>risk premium</strong> $\pi$</h2>
<p>Given assets $x$ and utility function $u$, a decision maker is <em>indifferent</em> between receiving a risk $\tilde{z}$ and the non-random amount $E[\tilde{z}]-\pi$.  Mathematically,
\(u(x+E(\tilde{z})-\pi(x,\tilde{z}))=E[u(x+\tilde{z})].\tag{1}\)</p>
<ul>
  <li>From equation (1) the risk premium is uniquely defined.</li>
  <li>It follows from (1) that for any constant $\mu$, $\pi(x,\tilde{z})=\pi(x+\mu,\tilde{z}-\mu)$. Therefore, we may only consider any <em>actuarially neutral</em> risk.</li>
</ul>

<p>There are other related concepts like the <em>cash equivalent</em> and <em>insurance premium</em>.
These concepts should be distinguished from the <em>bid price</em>.</p>

<hr />
<h1 id="local-risk-aversion">Local risk aversion</h1>
<p>Consider $\pi(x,\tilde{z})$ for an actuarially neutral risk $\tilde{z}$ with infinitesimal variance $\sigma_z^2{\rightarrow}0$. Locally expand equation (1) on both sides shows
\(\begin{equation}
\begin{split}
u(x-\pi)&amp;=u(x)-\pi u'(x) + O(\pi^2)\\
E[u(x+\tilde{z})]&amp;=E[u(x)+\tilde{z}u'(x)+\frac{1}{2}\tilde{z}^2u''(x)+O(\tilde{z}^3)]\\
&amp;=u(x)+\frac{1}{2}\sigma_z^2u''(x)+o(\sigma_z^2)
\end{split}
\end{equation}\)</p>

<p>Setting equal these equations gives
\(\pi(x,\tilde{z})=\frac{1}{2}\sigma_z^{2}r(x)+ o(\sigma_z^2),\tag{2}\)
where 
\(r(x)=-\frac{u''(x)}{u(x)}=-\frac{d}{dx}\log u'(x). \tag{3}\)</p>
<ul>
  <li>There is a similar interpretation with discrete risks and <em>probability premium</em> $p(x,h)=p(\tilde{z}=h)-p(\tilde{z}=-h)$.</li>
</ul>

<h2 id="relation-with-utility-function">Relation with utility function</h2>
<blockquote>
  <p>The local risk aversion function $r$ associated with any <strong>utility</strong> function $u$ contains <strong>all</strong> essntial information about $u$.</p>
</blockquote>

<p>Equation (3) implies
\(u\sim\int e^{-\int r} \tag{4}\)</p>
<ul>
  <li>It can be convenient to preserve $u$ since it determines <em>ordinary</em> (as against infinitesimal) risk preferences.</li>
</ul>

<h2 id="concavity">Concavity</h2>
<p>$u$’’$(x)$ is <strong>not</strong> in itself a meaningful measure of <em>concavity</em> in utility theory</p>
<ul>
  <li>the sign of $u$’’$(x)$ implies general attitude towards risk</li>
  <li>the absolute magnitude is not meaningful</li>
</ul>

<h1 id="comparative-global-risk-aversion">Comparative (global) risk aversion</h1>
<p>If $r_1(x)&gt;r_2(x)$, then $u_1$ is more risk-averse than $u_2$ at $x$ not only locally, but also globally. 
<strong>Thm.</strong> Let $r_i(x),\pi_i(x)$ be the local risk aversion and risk premium according to the utility function $u_i,\,i=1,2$. Then the following conditions are equivalent:</p>
<ol>
  <li>$r_{1}(x)\geq r_2(x)$ for all $x$</li>
  <li>$\pi_1(x,\tilde{z})\geq \pi_2(x,\tilde{z})$ for all $x$ and $\tilde{z}$</li>
  <li>$u_1(u_2^{-1}(t))$ is a concave function of $t$</li>
  <li>$\frac{u_1(y)-u_1(x)}{u_1(w)-u_{1(v)}} \leq \frac{u_2(y)-u_2(x)}{u_2(w)-u_2(v)}$ for all $v&lt;w\leq x&lt;y$.</li>
</ol>

<ul>
  <li>condition 1. requires local risk-aversion to be larger for any asset.</li>
</ul>

<hr />

<h1 id="special-family-of-risk-aversion">Special family of risk aversion</h1>
<h2 id="constant-risk-aversion"><em>Constant</em> risk aversion</h2>
<p>If the local risk aversion is constant $r(x)=c$, then</p>

\[u(x)\sim x\quad\text{if } r(x)=0 \tag{}\]

\[u(x)\sim -e^{-cx} \quad\text{if } r(x)=c&gt;0 \tag{}\]

\[u(x)\sim e^{-cx} \quad\text{if } r(x)=c&lt;0 \tag{}\]

<ul>
  <li>If the risk aversion is constant locally, then it is also constant globally.
For any $k,\,u(x+k)\sim u(x)$.</li>
</ul>

<h2 id="decreasing-risk-aversion">Decreasing risk aversion</h2>

<p><strong>Decreasing risk aversion</strong> describes a decision maker, who (1) attaches positive risk premium (<em>risk-averse</em>) to any risk, but (2) attaches smaller risk premium the greater his assets $x$. Formally,</p>
<ol>
  <li>$\pi(x,\tilde{z})&gt;0$ for all $x$ and $\tilde{z}$</li>
  <li>$\pi(x,\tilde{z})$ is a <em>strictly decreasing</em> function of $x$ for all (given) $\tilde{z}$</li>
</ol>

<p>Decreasing <strong>global</strong> risk aversion is equivalent to decreasing <strong>local</strong> risk aversion,i.e. the following conditions are <em>equivalent</em></p>
<ol>
  <li>The local risk aversion $r(x)$ is decreasing</li>
  <li>The risk premium $\pi(x,\tilde{z})$ is a decreasing function of $x$ for all $\tilde{z}$</li>
</ol>]]></content><author><name>Yifan Hong</name><email>hongyf23@mails.tsinghua.edu.cn</email></author><category term="utility-theory" /><summary type="html"><![CDATA[Pratt, John W. “Risk Aversion in the Small and in the Large.” Econometrica 32.1/2 (1964): 122-136.]]></summary></entry></feed>