# One Dimensional Heat Equation via SUNDIALS and Haskell

This is a short example of how to use SUNDIALS to solve a simple partial differential equation in Haskell via the hmatrix-sundials library. The example is taken from the C examples that come with the SUNDIALS source. Here’s the full blog. I’ll give a better URL soonish.

# Estimating Parameters in Chaotic Systems

Post available from my new site. Sadly WordPress doesn’t allow me to render the html exported by a Jupyter notebook.

# Introduction

Tribbles originate from the planet Iota Geminorum IV and, according to Dr. McCoy, are born pregnant. No further details are given but we can follow Gurtin and MacCamy (1974) and perhaps recover some of what happens on the Enterprise.

Of course, age-dependent population models are of more than fictional use and can be applied, for example, to modelling the progression of Malaria in infected hosts. We roughly follow some of J. J. Thibodeaux and Schlittenhardt (2011) who themselves reference Belair, Mackey, and Mahaffy (1995).

• The use of the hmatrix package which now contains functions to solve tridiagonal systems used in this post. You will need to use HEAD until a new hackage / stackage release is made. My future plan is to use CUDA via accelerate and compare.

• The use of dimensions in a medium-sized example. It would have been nice to have tried the units package but it seemed harder work to use and, as ever, “Time’s wingèd chariot” was the enemy.

# Age-Dependent Populations

## McKendrick / von Foerster

McKendrick and von Foerster independently derived a model of age-dependent population growth.

Let $n(a,t)$ be the density of females of age $a$ at time $t$. The number of females between ages $a$ and $a + \delta a$ are thus $n(a, t)\delta a$. Assuming individuals are born at age $0$, we have

$\displaystyle \frac{\partial}{\partial t}(n(a, t)\delta a) = J(a, t) - J(a + \delta a, t) - \mu(a, t)n(a, t)\delta a$

where $\mu(a, t)$ is the death rate density and $J(a, t)$ denotes the rate of entry to the cohort of age $a$. Dividing by $\delta a$ we obtain

$\displaystyle \frac{\partial}{\partial t}n(a, t) = - \frac{J(a + \delta a, t) - J(a, t)}{\delta a} - \mu(a, t)n(a, t)$

which in the limit becomes

$\displaystyle \frac{\partial}{\partial t}n(a, t) = - \frac{\partial J(a, t)}{\partial a} - \mu(a, t)n(a, t)$

We can further assume that the rate of entry to a cohort is proportional to the density of individuals times a velocity of aging $v(a, t)$.

$\displaystyle J(a, t) = n(a, t)v(a, t)$

Occasionally there is some reason to assume that aging one year is different to experiencing one year but we further assume $v = 1$.

We thus obtain

$\displaystyle \frac{\partial n(a, t)}{\partial t} + \frac{\partial n(a, t)}{\partial a} = - \mu(a, t)n(a, t)$

## Gurtin / MacCamy

To solve any PDE we need boundary and initial conditions. The number of births at time $t$ is

$\displaystyle n(0, t) = \int_0^\infty n(a, t) m(a, N(t))\, \mathrm{d}a$

where $m$ is the natality aka birth-modulus and

$\displaystyle N(t) = \int_0^\infty n(a, t)\, \mathrm{d}a$

and we further assume that the initial condition

$\displaystyle n(a, 0) = n_0(a)$

for some given $n_0$.

Gurtin and MacCamy (1974) focus on the situation where

$\displaystyle m(a, N(t)) = \beta(N)e^{-\alpha a}$

and we can also assume that the birth rate of Tribbles decreases exponentially with age and further that Tribbles can live forever. Gurtin and MacCamy (1974) then transform the PDE to obtain a pair of linked ODEs which can then be solved numerically.

Of course, we know what happens in the Enterprise and rather than continue with this example, let us turn our attention to the more serious subject of Malaria.

# Malaria

I realise now that I went a bit overboard with references. Hopefully they don’t interrupt the flow too much.

The World Health Organisation (WHO) estimated that in 2015 there were 214 million new cases of malaria resulting in 438,000 deaths (source: Wikipedia).

The lifecycle of the plasmodium parasite that causes malaria is extremely ingenious. J. J. Thibodeaux and Schlittenhardt (2011) model the human segment of the plasmodium lifecycle and further propose a way of determing an optimal treatment for an infected individual. Hall et al. (2013) also model the effect of an anti-malarial. Let us content ourselves with reproducing part of the paper by J. J. Thibodeaux and Schlittenhardt (2011).

At one part of its sojourn in humans, plasmodium infects erythrocytes aka red blood cells. These latter contain haemoglobin (aka hemoglobin). The process by which red blood cells are produced, Erythropoiesis, is modulated in a feedback loop by erythropoietin. The plasmodium parasite severely disrupts this process. Presumably the resulting loss of haemoglobin is one reason that an infected individual feels ill.

As can be seen in the overview by Torbett and Friedman (2009), the full feedback loop is complex. So as not to lose ourselves in the details and following J. J. Thibodeaux and Schlittenhardt (2011) and Belair, Mackey, and Mahaffy (1995), we consider a model with two compartments.

• Precursors: prototype erythrocytes developing in the bone marrow with $p(\mu, t)$ being the density of such cells of age $\mu$ at time $t$.

• Erythrocytes: mature red blood cells circulating in the blood with $m(\mu, t)$ being the density of such cells of age $\nu$ at time $t$.

\displaystyle \begin{aligned} \frac{\partial p(\mu, t)}{\partial t} + g(E(t))\frac{\partial p(\mu, t)}{\partial \mu} &= \sigma(\mu, t, E(t))p(\mu, t) & 0 < \mu < \mu_F & & 0 < t < T \\ \frac{\partial m(\nu, t)}{\partial t} + \phantom{g(E(t))}\frac{\partial m(\nu, t)}{\partial \nu} &= -\gamma(\nu, t, M(t))m(\nu, t) & 0 < \nu < \nu_F & & 0 < t < T \end{aligned}

where $\sigma(\mu, t, E(t))$ is the birth rate of precursors and $\gamma(\nu, t, M(t))$ is the death rate of erythrocytes, $g(E(t))$ is the maturation rate of precursors and where

$\displaystyle M(t) = \int_0^{\nu_F} p(\nu, t) \,\mathrm{d}\nu$

As boundary conditions, we have that the number of precursors maturing must equal the production of number of erythrocytes

$\displaystyle m(0, t) = g(E(t))p(\mu_F, t)$

and the production of the of the number of precursors depends on the level of erythropoietin

$\displaystyle g(E(t))p(0, t) = \phi(t)E(t)$

where $\phi(t)$ is some proportionality function.

As initial conditions, we have

\displaystyle \begin{aligned} p(\mu, 0) &= p_0(\mu) \\ m(\nu, 0) &= m_0(\nu) \end{aligned}

We can further model the erythropoietin dynamics as

$\displaystyle \frac{\mathrm{d}E(t)}{\mathrm{d}t} = f(M(t), t) - a_E(P(t))E(t)$

where $f$ is the feedback function from the kidneys and the decay rate, $a_E$ depends on the total precursor population $P(t)$ (Sawyer, Krantz, and Goldwasser (1987)) although this often is taken to be a constant and I would feel more comfortable with a more recent citation and where

$\displaystyle P(t) = \int_0^{\mu_F} p(\mu, t) \,\mathrm{d}\mu$

As initial condition we have

$\displaystyle E(0) = E_0$

## A Finite Difference Attempt

Let us try solving the above model using a finite difference scheme observing that we currently have no basis for whether it has a solution and whether the finite difference scheme approximates such a solution! We follow J. J. Thibodeaux and Schlittenhardt (2011) who give a proof of convergence presumably with some conditions; any failure of the scheme is entirely mine.

Divide up the age and time ranges, $[0, \mu_F]$, $[0, \nu_F]$ and $[0, T]$ into equal sub-intervals, $[\mu_i, \mu_{i+1}]$, $[\nu_j, \nu_{j+1}]$ and $[t_k, t_{k+1}]$ where

\displaystyle \begin{aligned} \mu_i &= i\Delta\mu & & \mathrm{for} & i = 1 \ldots n_1 \\ \nu_j &= j\Delta\nu & & \mathrm{for} & j = 1 \ldots n_2 \\ t_k &= k\Delta t & & \mathrm{for} & k = 1 \ldots K \end{aligned}

where $\Delta\mu = \mu_F / n_1$, $\Delta\nu = \nu_F / n_2$ and $\Delta t = T / K$.

Denoting $p(\mu_i, t_k) = p_i^k$ and similarly we obtain

\displaystyle \begin{aligned} \frac{p_i^{k+1} - p_i^k}{\Delta t} + g^k\frac{p_i^{k+1} - p_{i-1}^{k+1}}{\Delta\mu} &= \sigma_i^k p_i^{k+1} \\ \frac{m_j^{k+1} - m_j^k}{\Delta t} + \phantom{g^k}\frac{m_j^{k+1} - m_{j-1}^{k+1}}{\Delta\mu} &= -\gamma_j^k m_j^{k+1} \end{aligned}

and

\displaystyle \begin{aligned} \frac{E^{k+1} - E^k}{\Delta t} &= f^k - a_E^k E^{k+1} \\ g^k p_0^{k+1} &= \phi^k E^k \\ m_0^{k+1} &= g^k m_{n_1}^{k+1} \end{aligned}

Re-arranging we get

\displaystyle \begin{aligned} -g^k\frac{\Delta t}{\Delta \mu}p_{i-1}^{k+1} + \bigg(1 + g^k\frac{\Delta t}{\Delta \mu} - \Delta t \sigma_i^k\bigg)p_i^{k+1} &= p_i^k \\ \frac{\Delta t}{\Delta \mu}m_{j-1}^{k+1} + \bigg(1 + \frac{\Delta t}{\Delta \mu} + \Delta t \gamma_j^k\bigg)m_j^{k+1} &= m_j^k \end{aligned}

Writing

\displaystyle \begin{aligned} d_{1,i}^k &= 1 + g^k\frac{\Delta t}{\Delta \mu} - \Delta t \sigma_i^k \\ d_{2,i}^k &= 1 + \frac{\Delta t}{\Delta \nu} - \Delta t \gamma_i^k \end{aligned}

We can express the above in matrix form

$\displaystyle \begin{bmatrix} g^k & 0 & 0 & \ldots & 0 & 0 \\ -g^k\frac{\Delta t}{\Delta \mu} & d_{1,1}^k & 0 & \ldots & 0 & 0\\ 0 & -g^k\frac{\Delta t}{\Delta \mu} & d_{1,2}^k & \ldots & 0 & 0 \\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ 0 & 0 & 0 & \ldots &\ -g^k\frac{\Delta t}{\Delta \mu} & d_{1,n_1}^k \\ \end{bmatrix} \begin{bmatrix} p_0^{k+1} \\ p_1^{k+1} \\ p_2^{k+1} \\ \ldots \\ p_{n_1}^{k+1} \end{bmatrix} = \begin{bmatrix} \phi^k E^k \\ p_1^k \\ p_2^k \\ \ldots \\ p_{n_1}^k \\ \end{bmatrix}$

$\displaystyle \begin{bmatrix} 1 & 0 & 0 & \ldots & 0 & 0 \\ -\frac{\Delta t}{\Delta \mu} & d_{2,1}^k & 0 & \ldots & 0 & 0\\ 0 & -\frac{\Delta t}{\Delta \mu} & d_{2,2}^k & \ldots & 0 & 0 \\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ 0 & 0 & 0 & \ldots &\ -\frac{\Delta t}{\Delta \mu} & d_{2,n_1}^k \\ \end{bmatrix} \begin{bmatrix} m_0^{k+1} \\ m_1^{k+1} \\ m_2^{k+1} \\ \ldots \\ m_{n_2}^{k+1} \end{bmatrix} = \begin{bmatrix} g^k p_{n_1}^{k+1} \\ m_1^k \\ m_2^k \\ \ldots \\ m_{n_1}^k \\ \end{bmatrix}$

Finally we can write

$\displaystyle E^{k+1} = \frac{E^k + \Delta t f^k}{1 + a_E^k\Delta T}$

> {-# OPTIONS_GHC -Wall #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE NoImplicitPrelude #-}
> {-# LANGUAGE FlexibleContexts #-}
> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE TypeOperators #-}

> module Tribbles where

> import qualified Prelude as P

> import Numeric.Units.Dimensional.Prelude hiding (Unit)
> import Numeric.Units.Dimensional

> import Numeric.LinearAlgebra
> import Numeric.Integration.TanhSinh

> import Control.Monad.Writer


Substances like erythropoietin (EPO) are measured in International Units and these cannot be converted to Moles (see Jelkmann (2009) for much more detail) so we have to pretend it really is measured in Moles as there seems to be no easy way to define what the dimensional package calls a base dimension. A typical amount for a person is 15 milli-IU / mill-litre but can reach much higher levels after loss of blood.

> muPerMl :: (Fractional a, Num a) => Unit 'NonMetric DConcentration a
> muPerMl = (milli mole) / (milli litre)

> bigE'0 :: Concentration Double
> bigE'0 = 15.0 *~ muPerMl


Let’s set up our grid. We take these from Ackleh et al. (2006) but note that J. J. Thibodeaux and Schlittenhardt (2011) seem to have $T = 20$.

> deltaT, deltaMu, deltaNu :: Time Double

> deltaT = 0.05 *~ day
> deltaMu = 0.01 *~ day
> deltaNu = 0.05 *~ day

> bigT :: Time Double
> bigT = 100.0 *~ day

> muF, nuF :: Time Double
> muF = 5.9 *~ day
> nuF = 120.0 *~ day

> bigK :: Int
> bigK = floor (bigT / deltaT /~ one)

> n1 :: Int
> n1 = floor (muF / deltaMu /~ one)

> n2 :: Int
> n2 = floor (nuF / deltaNu /~ one)

> ts :: [Time Double]
> ts = take bigK $0.0 *~ day : (map (+ deltaT) ts)  The birth rate for precursors > betaThibodeaux :: Time Double -> > Frequency Double > betaThibodeaux mu > | mu < (0 *~ day) = error "betaThibodeaux: negative age" > | mu < (3 *~ day) = (2.773 *~ (one / day)) > | otherwise = (0.0 *~ (one /day))  > alphaThibodeaux :: Concentration Double -> > Frequency Double > alphaThibodeaux e = (0.5 *~ (muPerMl / day)) / ((1 *~ muPerMl) + e)  > sigmaThibodeaux :: Time Double -> > Time Double -> > Concentration Double -> > Frequency Double > sigmaThibodeaux mu _t e = gThibodeaux e * (betaThibodeaux mu - alphaThibodeaux e)  and an alternative birth rate > betaAckleh :: Time Double -> Frequency Double > betaAckleh mu > | mu < (0 *~ day) = error "betaAckleh: negative age" > | mu < (3 *~ day) = 2.773 *~ (one / day) > | otherwise = 0.000 *~ (one / day)  > sigmaAckleh :: Time Double -> > Time Double -> > Concentration Double -> > Frequency Double > sigmaAckleh mu _t e = betaAckleh mu * gAckleh e  J. J. Thibodeaux and Schlittenhardt (2011) give the maturation rate of precursors $g$ as > gThibodeaux :: Concentration Double -> Dimensionless Double > gThibodeaux e = d / n > where > n = ((3.02 *~ one) * e + (0.31 *~ muPerMl)) > d = (30.61 *~ muPerMl) + e  and Ackleh et al. (2006) give this as > gAckleh :: Concentration Double -> Dimensionless Double > gAckleh _e = 1.0 *~ one  As in J. J. Thibodeaux and Schlittenhardt (2011) we give quantities in terms of cells per kilogram of body weight. Note that these really are moles on this occasion. > type CellDensity = Quantity (DAmountOfSubstance / DTime / DMass)  Let’s set the initial conditions. > p'0 :: Time Double -> CellDensity Double > p'0 mu' = (1e11 *~ one) * pAux mu' > where > pAux mu > | mu < (0 *~ day) = error "p'0: negative age" > | mu < (3 *~ day) = 8.55e-6 *~ (mole / day / kilo gram) * > exp ((2.7519 *~ (one / day)) * mu) > | otherwise = 8.55e-6 *~ (mole / day / kilo gram) * > exp (8.319 *~ one - (0.0211 *~ (one / day)) * mu)  > m_0 :: Time Double -> CellDensity Double > m_0 nu' = (1e11 *~ one) * mAux nu' > where > mAux nu > | nu < (0 *~ day) = error "m_0: age less than zero" > | otherwise = 0.039827 *~ (mole / day / kilo gram) * > exp (((-0.0083) *~ (one / day)) * nu)  And check that these give plausible results. > m_0Untyped :: Double -> Double > m_0Untyped nu = m_0 (nu *~ day) /~ (mole / day / kilo gram)  > p'0Untyped :: Double -> Double > p'0Untyped mu = p'0 (mu *~ day) /~ (mole / day / kilo gram)  ghci> import Numeric.Integration.TanhSinh ghci> result$ relative 1e-6 $parTrap m_0Untyped 0.001 (nuF /~ day) 3.0260736659043414e11 ghci> result$ relative 1e-6 $parTrap p'0Untyped 0.001 (muF /~ day) 1.0453999900927126e10  We can now create the components for the first matrix equation. > g'0 :: Dimensionless Double > g'0 = gThibodeaux bigE'0  > d_1'0 :: Int -> Dimensionless Double > d_1'0 i = (1 *~ one) + (g'0 * deltaT / deltaMu) > - deltaT * sigmaThibodeaux ((fromIntegral i *~ one) * deltaMu) undefined bigE'0  > lowers :: [Dimensionless Double] > lowers = replicate n1 (negate$ g'0 * deltaT / deltaMu)

> diags :: [Dimensionless Double]
> diags = g'0 : map d_1'0 [1..n1]

> uppers :: [Dimensionless Double]
> uppers = replicate n1 (0.0 *~ one)


J. J. Thibodeaux and Schlittenhardt (2011) does not give a definition for $\phi$ so we use the equivalent $s_0$ from Ackleh et al. (2006) which references Banks et al. (2003): “$\times 10^{11}$ erythrocytes/kg body weight $\times$ mL plasma/mU Epo/day”

> s_0 :: Time Double ->
>        Quantity (DAmountOfSubstance / DTime / DMass / DConcentration) Double
> s_0 = const ((1e11 *~ one) * (4.45e-7 *~ (mole / day / kilo gram / muPerMl)))

> b'0 :: [CellDensity Double]
> b'0 = (s_0 (0.0 *~ day) * bigE'0) : (take n1 $map p'0 (iterate (+ deltaMu) deltaMu))  With these components in place we can now solve the implicit scheme and get the age distribution of precursors after one time step. > p'1 :: Matrix Double > p'1 = triDiagSolve (fromList (map (/~ one) lowers)) > (fromList (map (/~ one) diags)) > (fromList (map (/~ one) uppers)) > (((n1 P.+1 )><1) (map (/~ (mole / second / kilo gram)) b'0))  In order to create the components for the second matrix equation, we need the death rates of mature erythrocytes > gammaThibodeaux :: Time Double -> > Time Double -> > Quantity (DAmountOfSubstance / DMass) Double -> > Frequency Double > gammaThibodeaux _nu _t _bigM = 0.0083 *~ (one / day)  We note an alternative for the death rate > gammaAckleh :: Time Double -> > Time Double -> > Quantity (DAmountOfSubstance / DMass) Double -> > Frequency Double > gammaAckleh _nu _t bigM = (0.01 *~ (kilo gram / mole / day)) * bigM + 0.0001 *~ (one / day)  For the intial mature erythrocyte population we can either use the integral of the initial distribution > bigM'0 :: Quantity (DAmountOfSubstance / DMass) Double > bigM'0 = r *~ (mole / kilo gram) > where > r = result$ relative 1e-6 $parTrap m_0Untyped 0.001 (nuF /~ day)  ghci> bigM'0 3.0260736659043414e11 kg^-1 mol  or we can use the sum of the values used in the finite difference approximation > bigM'0' :: Quantity (DAmountOfSubstance / DMass) Double > bigM'0' = (* deltaNu)$ sum $map m_0$ take n2 $iterate (+ deltaNu) (0.0 *~ day)  ghci> bigM'0' 3.026741454719976e11 kg^-1 mol  Finally we can create the components > d_2'0 :: Int -> Dimensionless Double > d_2'0 i = (1 *~ one) + (g'0 * deltaT / deltaNu) > + deltaT * gammaThibodeaux ((fromIntegral i *~ one) * deltaNu) undefined bigM'0  > lowers2 :: [Dimensionless Double] > lowers2 = replicate n2 (negate$ deltaT / deltaNu)

> diags2 :: [Dimensionless Double]
> diags2 = (1.0 *~ one) : map d_2'0 [1..n2]

> uppers2 :: [Dimensionless Double]
> uppers2 = replicate n2 (0.0 *~ one)

> b_2'0 :: [CellDensity Double]
> b_2'0 = (g'0 * ((p'1 atIndex (n1,0)) *~ (mole / second / kilo gram))) :
>         (take n2 $map m_0 (iterate (+ deltaNu) deltaNu))  and then solve the implicit scheme to get the distribution of mature erythrocytes one time step ahead > m'1 :: Matrix Double > m'1 = triDiagSolve (fromList (map (/~ one) lowers2)) > (fromList (map (/~ one) diags2)) > (fromList (map (/~ one) uppers2)) > (((n2 P.+ 1)><1) (map (/~ (mole / second / kilo gram)) b_2'0))  We need to complete the homeostatic loop by implmenting the feedback from the kidneys to the bone marrow. Ackleh and Thibodeaux (2013) and Ackleh et al. (2006) give $f$ as > fAckleh :: Time Double -> > Quantity (DAmountOfSubstance / DMass) Double -> > Quantity (DConcentration / DTime) Double > fAckleh _t bigM = a / ((1.0 *~ one) + k * (bigM' ** r)) > where > a = 15600 *~ (muPerMl / day) > k = 0.0382 *~ one > r = 6.96 *~ one > bigM' = ((bigM /~ (mole / kilo gram)) *~ one) * (1e-11 *~ one)  The much older Belair, Mackey, and Mahaffy (1995) gives $f$ as > fBelair :: Time Double -> > Quantity (DAmountOfSubstance / DMass) Double -> > Quantity (DConcentration / DTime) Double > fBelair _t bigM = a / ((1.0 *~ one) + k * (bigM' ** r)) > where > a = 6570 *~ (muPerMl / day) > k = 0.0382 *~ one > r = 6.96 *~ one > bigM' = ((bigM /~ (mole / kilo gram)) *~ one) * (1e-11 *~ one)  For the intial precursor population we can either use the integral of the initial distribution result$ relative 1e-6 $parTrap p'0Untyped 0.001 (muF /~ day) > bigP'0 :: Quantity (DAmountOfSubstance / DMass) Double > bigP'0 = r *~ (mole / kilo gram) > where > r = result$ relative 1e-6 $parTrap p'0Untyped 0.001 (muF /~ day)  ghci> bigP'0 1.0453999900927126e10 kg^-1 mol  or we can use the sum of the values used in the finite difference approximation > bigP'0' :: Quantity (DAmountOfSubstance / DMass) Double > bigP'0' = (* deltaMu)$ sum $map p'0$ take n1 $iterate (+ deltaMu) (0.0 *~ day)  ghci> bigP'0' 1.0438999930030743e10 kg^-1 mol  J. J. Thibodeaux and Schlittenhardt (2011) give the following for $a_E$ > a_E :: Quantity (DAmountOfSubstance / DMass) Double -> Frequency Double > a_E bigP = ((n / d) /~ one) *~ (one / day) > where > n :: Dimensionless Double > n = bigP * (13.8 *~ (kilo gram / mole)) + 0.04 *~ one > d :: Dimensionless Double > d = (bigP /~ (mole / kilo gram)) *~ one + 0.08 *~ one  but from Ackleh et al. (2006) The only biological basis for the latter is that the decay rate of erythropoietin should be an increasing function of the precursor population and this function remains in the range 0.50–6.65 $\mathrm{days}^{-1}$ and, given this is at variance with their given function, it may be safer to use their alternative of > a_E' :: Quantity (DAmountOfSubstance / DMass) Double -> Frequency Double > a_E' _bigP = 6.65 *~ (one / day)  We now further calculate the concentration of EPO one time step ahead. > f'0 :: Quantity (DConcentration / DTime) Double > f'0 = fAckleh undefined bigM'0  > bigE'1 :: Concentration Double > bigE'1 = (bigE'0 + deltaT * f'0) / (1.0 *~ one + deltaT * a_E' bigP'0)  Having done this for one time step starting at $t=0$, it’s easy to generalize this to an arbitrary time step. > d_1 :: Dimensionless Double -> > Concentration Double -> > Int -> > Dimensionless Double > d_1 g e i = (1 *~ one) + (g * deltaT / deltaMu) > - deltaT * sigmaThibodeaux ((fromIntegral i *~ one) * deltaMu) undefined e  > d_2 :: Quantity (DAmountOfSubstance / DMass) Double -> > Int -> > Dimensionless Double > d_2 bigM i = (1 *~ one) + deltaT / deltaNu > + deltaT * gammaThibodeaux ((fromIntegral i *~ one) * deltaNu) undefined bigM  > oneStepM :: (Matrix Double, Matrix Double, Concentration Double, Time Double) -> > Writer [(Quantity (DAmountOfSubstance / DMass) Double, > Quantity (DAmountOfSubstance / DMass) Double, > Concentration Double)] > (Matrix Double, Matrix Double, Concentration Double, Time Double) > oneStepM (psPrev, msPrev, ePrev, tPrev) = do > let > g = gThibodeaux ePrev > ls = replicate n1 (negate$ g * deltaT / deltaMu)
>     ds = g : map (d_1 g ePrev)  [1..n1]
>     us = replicate n1 (0.0 *~ one)
>     b1'0 = (s_0 tPrev * ePrev) /~ (mole / second / kilo gram)
>     b1 = asColumn $vjoin [scalar b1'0, subVector 1 n1$ flatten psPrev]
>     psNew :: Matrix Double
>     psNew = triDiagSolve (fromList (map (/~ one) ls))
>                          (fromList (map (/~ one) ds))
>                          (fromList (map (/~ one) us))
>                          b1
>     ls2 = replicate n2 (negate $deltaT / deltaNu) > bigM :: Quantity (DAmountOfSubstance / DMass) Double > bigM = (* deltaNu)$ ((sumElements msPrev) *~ (mole / kilo gram / second))
>     ds2 = (1.0 *~ one) : map (d_2 bigM) [1..n2]
>     us2 = replicate n2 (0.0 *~ one)
>     b2'0 = (g * (psNew atIndex (n1, 0) *~ (mole / second / kilo gram))) /~
>            (mole / second / kilo gram)
>     b2 = asColumn $vjoin [scalar b2'0, subVector 1 n2$ flatten msPrev]
>     msNew :: Matrix Double
>     msNew = triDiagSolve (fromList (map (/~ one) ls2))
>                          (fromList (map (/~ one) ds2))
>                          (fromList (map (/~ one) us2))
>                          b2
>     bigP :: Quantity (DAmountOfSubstance / DMass) Double
>     bigP = (* deltaMu) $sumElements psPrev *~ (mole / kilo gram / second) > f :: Quantity (DConcentration / DTime) Double > f = fAckleh undefined bigM > eNew :: Concentration Double > eNew = (ePrev + deltaT * f) / (1.0 *~ one + deltaT * a_E' bigP) > tNew = tPrev + deltaT > tell [(bigP, bigM, ePrev)] > return (psNew, msNew, eNew, tNew)  We can now run the model for 100 days. > ys :: [(Quantity (DAmountOfSubstance / DMass) Double, > Quantity (DAmountOfSubstance / DMass) Double, > Concentration Double)] > ys = take 2000$
>      snd $> runWriter$
>      iterateM_ oneStepM ((((n1 P.+1 )><1) (map (/~ (mole / second / kilo gram)) b'0)),
>                          (((n2 P.+ 1)><1) $(map (/~ (mole / second / kilo gram)) b_2'0)), > bigE'0, > (0.0 *~ day))  And now we can plot what happens for a period of 100 days. # References Ackleh, Azmy S., and Jeremy J. Thibodeaux. 2013. “A second-order finite difference approximation for a mathematical model of erythropoiesis.” Numerical Methods for Partial Differential Equations, no. November: n/a–n/a. doi:10.1002/num.21778. Ackleh, Azmy S., Keng Deng, Kazufumi Ito, and Jeremy Thibodeaux. 2006. “A Structured Erythropoiesis Model with Nonlinear Cell Maturation Velocity and Hormone Decay Rate.” Mathematical Biosciences 204 (1): 21–48. doi:http://dx.doi.org/10.1016/j.mbs.2006.08.004. Banks, H T, Cammey E Cole, Paul M Schlosser, and Hien T Tran. 2003. “Modeling and Optimal Regulation of Erythropoiesis Subject to Benzene Intoxication.” https://www.ncsu.edu/crsc/reports/ftp/pdf/crsc-tr03-49.pdf. Belair, Jacques, Michael C. Mackey, and Joseph M. Mahaffy. 1995. “Age-Structured and Two-Delay Models for Erythropoiesis.” Mathematical Biosciences 128 (1): 317–46. doi:http://dx.doi.org/10.1016/0025-5564(94)00078-E. Gurtin, Morton E, and Richard C MacCamy. 1974. “Non-Linear Age-Dependent Population Dynamics.” Archive for Rational Mechanics and Analysis 54 (3). Springer: 281–300. Hall, Adam J, Michael J Chappell, John AD Aston, and Stephen A Ward. 2013. “Pharmacokinetic Modelling of the Anti-Malarial Drug Artesunate and Its Active Metabolite Dihydroartemisinin.” Computer Methods and Programs in Biomedicine 112 (1). Elsevier: 1–15. Jelkmann, Wolfgang. 2009. “Efficacy of Recombinant Erythropoietins: Is There Unity of International Units?” Nephrology Dialysis Transplantation 24 (5): 1366. doi:10.1093/ndt/gfp058. Sawyer, Stephen T, SB Krantz, and E Goldwasser. 1987. “Binding and Receptor-Mediated Endocytosis of Erythropoietin in Friend Virus-Infected Erythroid Cells.” Journal of Biological Chemistry 262 (12). ASBMB: 5554–62. Thibodeaux, Jeremy J., and Timothy P. Schlittenhardt. 2011. “Optimal Treatment Strategies for Malaria Infection.” Bulletin of Mathematical Biology 73 (11): 2791–2808. doi:10.1007/s11538-011-9650-8. Torbett, Bruce E., and Jeffrey S. Friedman. 2009. “Erythropoiesis: An Overview.” In Erythropoietins, Erythropoietic Factors, and Erythropoiesis: Molecular, Cellular, Preclinical, and Clinical Biology, edited by Steven G. Elliott, Mary Ann Foote, and Graham Molineux, 3–18. Basel: Birkhäuser Basel. doi:10.1007/978-3-7643-8698-6_1. # Warming up for NUTS (No U-Turn) I have been thinking about writing a blog on why the no u-turn sampler (NUTS) works rather than describing the actual algorithm. This led me to look at Jared Tobin’s Haskell implementation. His example tries to explore the Himmelblau function but only finds one local minima. This is not unexpected; as the excellent Stan manual notes Being able to carry out such invariant inferences in practice is an altogether different matter. It is almost always intractable to find even a single posterior mode, much less balance the exploration of the neighborhoods of multiple local maxima according to the probability masses. and For HMC and NUTS, the problem is that the sampler gets stuck in one of the two "bowls" around the modes and cannot gather enough energy from random momentum assignment to move from one mode to another. rm(list = ls(all.names=TRUE)) unlink(".RData") rstan::stan_version() ## [1] "2.12.0" rstan_options(auto_write = TRUE) On the Rosenbrock function it fares much better. knitr::include_graphics("RosenbrockA.png") We can’t (at least I don’t know how to) try Stan out on Rosenbrock as its not a distribution but we can try it out on another nasty problem: the funnel. Some of this is taken directly from the Stan manual. Here’s the Stan: parameters { real y; vector[9] x; } model { y ~ normal(0,3); x ~ normal(0,exp(y/2)); } which we can run with the following R: funnel_fit <- stan(file='funnel.stan', cores=4, iter=10000) ## Warning: There were 92 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See ## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup ## Warning: Examine the pairs() plot to diagnose sampling problems funnel_samples <- extract(funnel_fit,permuted=TRUE,inc_warmup=FALSE); funnel_df <- data.frame(x1=funnel_samples$x[,1],y=funnel_samples$y[]) Plotting the data requires some unpleasantness but shows the neck of the funnel does not get explored. So even HMC and NUTS do not perform well. midpoints <- function(x, dp=2){ lower <- as.numeric(gsub(",.*","",gsub("\$$|\$|\$$|\$","", x))) upper <- as.numeric(gsub(".*,","",gsub("\$$|\$|\$$|\$","", x))) return(round(lower+(upper-lower)/2, dp)) } df <- funnel_df[funnel_df$x1 < 20 & funnel_df$x1 > -20 & funnel_df$y < 9 & funnel_df$y > -9,] x_c <- cut(df$x1, 20)
y_c <- cut(df$y, 20) z <- table(x_c, y_c) z_df <- as.data.frame(z) a_df <- data.frame(x=midpoints(z_df$x_c),y=midpoints(z_df$y_c),f=z_df$Freq)

m = as.matrix(dcast(a_df,x ~ y))
## Using f as value column: use value.var to override.
hist3D(x=m[,"x"],y=as.double(colnames(m)[2:21]),z=(as.matrix(dcast(a_df,x ~ y)))[,2:21], border="black",ticktype = "detailed",theta=5,phi=20)
## Using f as value column: use value.var to override.

Since the analytic form of the distribution is known, one can apply a trick to correct this problem and then one is sampling from unit normals!

parameters {
real y_raw;
vector[9] x_raw;
}
transformed parameters {
real y;
vector[9] x;

y <- 3.0 * y_raw;
x <- exp(y/2) * x_raw;
}
model {
y_raw ~ normal(0,1);
x_raw ~ normal(0,1);
}

And now the neck of the funnel is explored.

funnel_fit <- stan(file='funnel_reparam.stan', cores=4, iter=10000)
funnel_samples <- extract(funnel_fit,permuted=TRUE,inc_warmup=FALSE);
funnel_df <- data.frame(x1=funnel_samples$x[,1],y=funnel_samples$y[])

df <- funnel_df[funnel_df$x1 < 20 & funnel_df$x1 > -20 & funnel_df$y < 9 & funnel_df$y > -9,]
x_c <- cut(df$x1, 20) y_c <- cut(df$y, 20)
z <- table(x_c, y_c)
z_df <- as.data.frame(z)
a_df <- data.frame(x=midpoints(z_df$x_c),y=midpoints(z_df$y_c),f=z_df$Freq) m = as.matrix(dcast(a_df,x ~ y)) ## Using f as value column: use value.var to override. hist3D(x=m[,"x"],y=as.double(colnames(m)[2:21]),z=(as.matrix(dcast(a_df,x ~ y)))[,2:21], border="black",ticktype = "detailed",theta=5,phi=20) ## Using f as value column: use value.var to override. We’d expect the Haskell implementation to also fail to explore the neck. Maybe I will return to this after the article on why NUTS works. # Calling Haskell from C As part of improving the random number generation story for Haskell, I want to be able to use the testu01 library with the minimal amount of Haskell wrapping. testu01 assumes that there is a C function which returns the random number. The ghc manual gives an example but does not give all the specifics. These are my notes on how to get the example working under OS X (El Capitain 10.11.5 to be precise). The Haskell: {-# OPTIONS_GHC -Wall #-} {-# LANGUAGE ForeignFunctionInterface #-} module Foo where foreign export ccall foo :: Int -> IO Int foo :: Int -> IO Int foo n = return (length (f n)) f :: Int -> [Int] f 0 = [] f n = n:(f (n-1)) The .cabal: name: test-via-c version: 0.1.0.0 homepage: TBD license: MIT author: Dominic Steinitz maintainer: idontgetoutmuch@gmail.com category: System build-type: Simple cabal-version: >=1.10 executable Foo.dylib main-is: Foo.hs other-extensions: ForeignFunctionInterface build-depends: base >=4.7 && =0.6 && <0.7 hs-source-dirs: src default-language: Haskell2010 include-dirs: src ghc-options: -O2 -shared -fPIC -dynamic extra-libraries: HSrts-ghc8.0.1 On my computer running cabal install places the library in ~/Library/Haskell/ghc-8.0.1/lib/test-via-c-0.1.0.0/bin The C: #include #include "HsFFI.h" #include "../dist/build/Foo.dylib/Foo.dylib-tmp/Foo_stub.h" int main(int argc, char *argv[]) { int i; hs_init(&argc, &argv); for (i = 0; i < 5; i++) { printf("%d\n", foo(2500)); } hs_exit(); return 0; } On my computer this can be compiled with gcc-6 Bar.c ~/Library/Haskell/ghc-8.0.1/lib/test-via-c-0.1.0.0/bin/Foo.dylib -I/Library/Frameworks/GHC.framework/Versions/8.0.1-x86_64/usr/lib/ghc-8.0.1/include -L/Library/Frameworks/GHC.framework/Versions/8.0.1-x86_64/usr/lib/ghc-8.0.1/rts -lHSrts-ghc8.0.1 and can be run with DYLD_LIBRARY_PATH= ~/Library/Haskell/ghc-8.0.1/lib/test-via-c-0.1.0.0/bin: /Library/Frameworks/GHC.framework/Versions/8.0.1-x86_64/usr/lib/ghc-8.0.1/rts N.B. setting DYLD_LIBRARY_PATH like this is not recommended as it is a good way of breaking things. I have tried setting DYLD_FALLBACK_LIBRARY_PATH but only to get an error message. Hopefully, at some point I will be able to post a robust way of getting the executable to pick up the required dynamic libraries. # UK / South Korea Trade: A Bayesian Analysis # Introduction I was intrigued by a tweet by the UK Chancellor of the Exchequer stating "exports [to South Korea] have doubled over the last year. Now worth nearly £11bn” and a tweet by a Member of the UK Parliament stating South Korea "our second fastest growing trading partner". Although I have never paid much attention to trade statistics, both these statements seemed surprising. But these days it’s easy enough to verify such statements. It’s also an opportunity to use the techniques I believe data scientists in (computer) game companies use to determine how much impact a new feature has on the game’s consumers. One has to be slightly careful with trade statistics as they come in many different forms, e.g., just goods or goods and services etc. When I provide software and analyses to US organisations, I am included in the services exports from the UK to the US. Let’s analyse goods first before moving on to goods and services. # Getting the Data First let’s get hold of the quarterly data from the UK Office of National Statistics. ukstats <- "https://www.ons.gov.uk" bop <- "economy/nationalaccounts/balanceofpayments" ds <- "datasets/tradeingoodsmretsallbopeu2013timeseriesspreadsheet/current/mret.csv" mycsv <- read.csv(paste(ukstats,"file?uri=",bop,ds,sep="/"),stringsAsFactors=FALSE) Now we can find the columns that refer to Korea. ns <- which(grepl("Korea", names(mycsv))) length(ns) ## [1] 3 names(mycsv[ns[1]]) ## [1] "BoP.consistent..South.Korea..Exports..SA................................" names(mycsv[ns[2]]) ## [1] "BoP.consistent..South.Korea..Imports..SA................................" names(mycsv[ns[3]]) ## [1] "BoP.consistent..South.Korea..Balance..SA................................" Now we can pull out the relevant information and create a data frame of it. korean <- mycsv[grepl("Korea", names(mycsv))] imports <- korean[grepl("Imports", names(korean))] exports <- korean[grepl("Exports", names(korean))] balance <- korean[grepl("Balance", names(korean))] df <- data.frame(mycsv[grepl("Title", names(mycsv))], imports, exports, balance) colnames(df) <- c("Title", "Imports", "Exports", "Balance") startQ <- which(grepl("1998 Q1",df$Title))
endQ <- which(grepl("2016 Q3",df$Title)) dfQ <- df[startQ:endQ,] We can now plot the data. tab <- data.frame(kr=as.numeric(dfQ$Exports),
krLabs=as.numeric(as.Date(as.yearqtr(dfQ$Title,format='%Y Q%q')))) ggplot(tab, aes(x=as.Date(tab$krLabs), y=tab$kr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Quarterly)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)") For good measure let’s plot the annual data. startY <- grep("^1998$",df$Title) endY <- grep("^2015$",df$Title) dfYear <- df[startY:endY,] tabY <- data.frame(kr=as.numeric(dfYear$Exports),
krLabs=as.numeric(dfYear$Title)) ggplot(tabY, aes(x=tabY$krLabs, y=tabY$kr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Annual)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)") And the monthly data. startM <- grep("1998 JAN",df$Title)
endM <- grep("2016 OCT",df$Title) dfMonth <- df[startM:endM,] tabM <- data.frame(kr=as.numeric(dfMonth$Exports),
krLabs=as.numeric(as.Date(as.yearmon(dfMonth$Title,format='%Y %B')))) ggplot(tabM, aes(x=as.Date(tabM$krLabs), y=tabMkr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Monthly)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)") It looks like some change took place in 2011 but nothing to suggest either that "export have doubled over the last year" or that South Korea is "our second fastest growing partner". That some sort of change did happen is further supported by the fact a Free Trade Agreement between the EU and Korea was put in place in 2011. But was there really a change? And what sort of change was it? Sometimes it’s easy to imagine patterns where there are none. With this warning in mind let us see if we can get a better feel from the numbers as to what happened. # The Model Let us assume that the data for exports are approximated by a linear function of time but that there is a change in the slope and the offset at some point during observation. \begin{aligned} \tau &\sim {\mathrm{Uniform}}(1, N) \\ \mu_1 &\sim \mathcal{N}(\mu_{\mu_1}, \sigma_{\mu_1}) \\ \gamma_1 &\sim \mathcal{N}(\mu_{\gamma_1}, \sigma_{\gamma_1}) \\ \sigma_1 &\sim \mathcal{N}(\mu_{\sigma_1}, \sigma_{\sigma_1}) \\ \mu_2 &\sim \mathcal{N}(\mu_{\mu_2}, \sigma_{\mu_2}) \\ \gamma_2 &\sim \mathcal{N}(\mu_{\gamma_2}, \sigma_{\gamma_2}) \\ \sigma_2 &\sim \mathcal{N}(\mu_{\sigma_2}, \sigma_{\sigma_2}) \\ y_i &\sim \begin{cases} \mathcal{N}(\mu_1 x_i + \gamma_1, \sigma_1) & \mbox{if } i < \tau \\ \mathcal{N}(\mu_2 x_i + \gamma_2, \sigma_2), & \mbox{if } i \geq \tau \end{cases} \end{aligned} Since we are going to use stan to infer the parameters for this model and stan cannot handle discrete parameters, we need to marginalize out this (discrete) parameter. I hope to do the same analysis with LibBi which seems more suited to time series analysis and which I believe will not require such a step. Setting D = {yi}i = 1N we can calculate the likelihood \begin{aligned} p(D \,|\, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) &= \sum_{n=1}^N p(\tau, D \,|\, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) \\ &= \sum_{\tau=1}^N p(\tau) p(D \,|\, \tau, \mu_1, \sigma_1, \mu_2, \sigma_2) \\ &=\sum_{\tau=1}^N p(\tau) \prod_{i=1}^N p(y_i \,|\, \tau, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) \end{aligned} stan operates on the log scale and thus requires the log likelihood $\log p(D \,|\, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) = \mathrm{log\_sum\_exp}_{\tau=1}^T \big( \log \mathcal{U}(\tau \, | \, 1, T) \\ + \sum_{i=1}^T \log \mathcal{N}(y_i \, | \, \nu_i, \rho_i) \big)$ where \begin{aligned} \nu_i &= \begin{cases} \mu_1 x_i + \gamma_1 & \mbox{if } i < \tau \\ \mu_2 x_i + \gamma_2 & \mbox{if } i \geq \tau \end{cases} \\ \rho_i &= \begin{cases} \sigma_1 & \mbox{if } i < \tau \\ \sigma_2 & \mbox{if } i \geq \tau \end{cases} \end{aligned} and where the log sum of exponents function is defined by $\mathrm{\log\_sum\_exp}_{n=1}^N \, \alpha_n = \log \sum_{n=1}^N \exp(\alpha_n).$ The log sum of exponents function allows the model to be coded directly in Stan using the built-in function , which provides both arithmetic stability and efficiency for mixture model calculations. ## Stan Here’s the model in stan. Sadly I haven’t found a good way of divvying up .stan files in a .Rmd file so that it still compiles. data { int<lower=1> N; real x[N]; real y[N]; } parameters { real mu1; real mu2; real gamma1; real gamma2; real<lower=0> sigma1; real<lower=0> sigma2; } transformed parameters { vector[N] log_p; real mu; real sigma; log_p = rep_vector(-log(N), N); for (tau in 1:N) for (i in 1:N) { mu = i < tau ? (mu1 * x[i] + gamma1) : (mu2 * x[i] + gamma2); sigma = i < tau ? sigma1 : sigma2; log_p[tau] = log_p[tau] + normal_lpdf(y[i] | mu, sigma); } } model { mu1 ~ normal(0, 10); mu2 ~ normal(0, 10); gamma1 ~ normal(0, 10); gamma2 ~ normal(0, 10); sigma1 ~ normal(0, 10); sigma2 ~ normal(0, 10); target += log_sum_exp(log_p); } generated quantities { int<lower=1,upper=N> tau; tau = categorical_rng(softmax(log_p)); } The above, although mimicking our mathematical model, has quadratic complexity and we can use the trick in the stan manual to make it linear albeit with less clarity. data { int<lower=1> N; real x[N]; real y[N]; } parameters { real mu1; real mu2; real gamma1; real gamma2; real<lower=0> sigma1; real<lower=0> sigma2; } transformed parameters { vector[N] log_p; { vector[N+1] log_p_e; vector[N+1] log_p_l; log_p_e[1] = 0; log_p_l[1] = 0; for (i in 1:N) { log_p_e[i + 1] = log_p_e[i] + normal_lpdf(y[i] | mu1 * x[i] + gamma1, sigma1); log_p_l[i + 1] = log_p_l[i] + normal_lpdf(y[i] | mu2 * x[i] + gamma2, sigma2); } log_p = rep_vector(-log(N) + log_p_l[N + 1], N) + head(log_p_e, N) - head(log_p_l, N); } } model { mu1 ~ normal(0, 10); mu2 ~ normal(0, 10); gamma1 ~ normal(0, 10); gamma2 ~ normal(0, 10); sigma1 ~ normal(0, 10); sigma2 ~ normal(0, 10); target += log_sum_exp(log_p); } generated quantities { int<lower=1,upper=N> tau; tau = categorical_rng(softmax(log_p)); } Let’s run this model with the monthly data. NM <- nrow(tabM) KM <- ncol(tabM) yM <- tabMkr
XM <- data.frame(tabM,rep(1,NM))[,2:3]

fitM <- stan(
file = "lr-changepoint-ng.stan",
data = list(x = XM$krLabs, y = yM, N = length(yM)), chains = 4, warmup = 1000, iter = 10000, cores = 4, refresh = 500, seed=42 ) ## Warning: There were 662 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See ## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup ## Warning: Examine the pairs() plot to diagnose sampling problems Looking at the results below we see a multi-modal distribution so a mean is not of much use. histData <- hist(extract(fitM)$tau,plot=FALSE,breaks=c(seq(1,length(yM),1)))
histData$counts ## [1] 18000 0 0 0 0 0 0 0 0 0 0 ## [12] 0 0 0 0 0 0 0 0 0 0 0 ## [23] 0 0 0 0 0 0 0 0 0 0 0 ## [34] 0 0 0 0 0 0 0 0 0 0 0 ## [45] 0 0 0 0 0 0 0 0 0 0 0 ## [56] 0 0 0 0 0 0 0 0 0 0 0 ## [67] 0 0 0 0 0 0 0 0 0 0 0 ## [78] 0 0 0 0 0 0 0 0 0 0 0 ## [89] 0 0 0 0 0 0 0 0 0 0 0 ## [100] 0 0 0 0 0 0 0 0 0 0 0 ## [111] 0 0 0 0 0 0 0 1 4 12 16 ## [122] 16 107 712 8132 0 0 0 0 0 0 0 ## [133] 0 0 0 0 0 0 0 0 0 0 0 ## [144] 0 0 0 0 0 0 0 0 0 0 0 ## [155] 0 0 0 0 0 0 0 0 0 0 25 ## [166] 171 2812 0 0 0 0 0 0 0 0 0 ## [177] 0 0 0 0 0 0 0 0 0 0 0 ## [188] 0 0 0 0 0 0 0 0 0 0 0 ## [199] 0 0 0 0 0 0 0 0 0 0 0 ## [210] 0 0 0 0 0 0 0 0 0 0 0 ## [221] 0 0 0 0 5992 We can get a pictorial representation of the maxima so that the multi-modality is even clearer. min_indexes = which(diff( sign(diff( c(0,histData$counts,0)))) == 2)
max_indexes = which(diff(  sign(diff( c(0,histData$counts,0)))) == -2) modeData = data.frame(x=1:length(histData$counts),y=histData$counts) min_locs = modeData[min_indexes,] max_locs = modeData[max_indexes,] plot(modeData$y, type="l")
points( min_locs, col="red", pch=19, cex=1  )
points( max_locs, col="green", pch=19, cex=1  )

My interpretation is that the evidence (data) says there is probably no changepoint (a change at the beginning or end is no change) but there might be a change at intermediate data points.

We can see something strange (maybe a large single export?) happened at index 125 which translates to 2008MAY.

The mode at index 167 which translates to 2011NOV corresponds roughly to the EU / Korea trade agreement.

Let us assume that there really was a material difference in trade at this latter point. We can fit a linear regression before this point and one after this point.

Here’s the stan

data {
int<lower=1> N;
int<lower=1> K;
matrix[N,K]  X;
vector[N]    y;
}
parameters {
vector[K] beta;
real<lower=0> sigma;
}
model {
y ~ normal(X * beta, sigma);
}

And here’s the R to fit the before and after data. We fit the model, pull out the parameters for the regression and pull out the covariates

N <- length(yM)
M <- max_locs$x[3] fite <- stan(file = 'LR.stan', data = list(N = M, K = ncol(XM), y = yM[1:M], X = XM[1:M,]), pars=c("beta", "sigma"), chains=3, cores=3, iter=3000, warmup=1000, refresh=-1) se <- extract(fite, pars = c("beta", "sigma"), permuted=TRUE) estCovParamsE <- colMeans(se$beta)

fitl <- stan(file = 'LR.stan',
data = list(N = N-M, K = ncol(XM), y = yM[(M+1):N], X = XM[(M+1):N,]),
pars=c("beta", "sigma"),
chains=3,
cores=3,
iter=3000,
warmup=1000,
refresh=-1)

sl <- extract(fitl, pars = c("beta", "sigma"), permuted=TRUE)
estCovParamsL <- colMeans(sl$beta) Make predictions linRegPredsE <- data.matrix(XM) %*% estCovParamsE linRegPredsL <- data.matrix(XM) %*% estCovParamsL ggplot(tabM, aes(x=as.Date(tabM$krLabs), y=tabM$kr)) + geom_line(aes(x = as.Date(tabM$krLabs), y = tabM$kr, col = "Actual")) + geom_line(data=tabM[1:M,], aes(x = as.Date(tabM$krLabs[1:M]), y = linRegPredsE[(1:M),1], col = "Fit (Before FTA)")) +
geom_line(data=tabM[(M+1):N,], aes(x = as.Date(tabM$krLabs[(M+1):N]), y = linRegPredsL[((M+1):N),1], col = "Fit (After FTA)")) + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Monthly)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)") # An Intermediate Conclusion and Goods and Services (Pink Book) So we didn’t manage to substantiate either the Chancellor’s claim or the Member of Parliament’s claim. But it may be that we can if we look at Goods and Services then we might be able to see the numbers resulting in the claims. pb <- "datasets/pinkbook/current/pb.csv" pbcsv <- read.csv(paste(ukstats,"file?uri=",bop,pb,sep="/"),stringsAsFactors=FALSE) This has a lot more information albeit only annually. pbns <- grep("Korea", names(pbcsv)) length(pbns) ## [1] 21 lapply(pbns,function(x) names(pbcsv[x])) ## [[1]] ## [1] "BoP..Current.Account..Goods...Services..Imports..South.Korea............" ## ## [[2]] ## [1] "BoP..Current.Account..Current.Transfer..Balance..South.Korea............" ## ## [[3]] ## [1] "BoP..Current.Account..Goods...Services..Balance..South.Korea............" ## ## [[4]] ## [1] "IIP..Assets..Total.South.Korea.........................................." ## ## [[5]] ## [1] "Trade.in.Services.replaces.1.A.B....Exports.Credits...South.Korea...nsa." ## ## [[6]] ## [1] "IIP...Liabilities...Total...South.Korea................................." ## ## [[7]] ## [1] "BoP..Total.income..Balance..South.Korea................................." ## ## [[8]] ## [1] "BoP..Total.income..Debits..South.Korea.................................." ## ## [[9]] ## [1] "BoP..Total.income..Credits..South.Korea................................." ## ## [[10]] ## [1] "BoP..Current.account..Balance..South.Korea.............................." ## ## [[11]] ## [1] "BoP..Current.account..Debits..South.Korea..............................." ## ## [[12]] ## [1] "BoP..Current.account..Credits..South.Korea.............................." ## ## [[13]] ## [1] "IIP...Net...Total....South.Korea........................................" ## ## [[14]] ## [1] "Trade.in.Services.replaces.1.A.B....Imports.Debits...South.Korea...nsa.." ## ## [[15]] ## [1] "BoP..Current.Account..Services..Total.Balance..South.Korea.............." ## ## [[16]] ## [1] "Bop.consistent..Balance..NSA..South.Korea..............................." ## ## [[17]] ## [1] "Bop.consistent..Im..NSA..South.Korea...................................." ## ## [[18]] ## [1] "Bop.consistent..Ex..NSA..South.Korea...................................." ## ## [[19]] ## [1] "Current.Transfers...Exports.Credits...South.Korea...nsa................." ## ## [[20]] ## [1] "Current.Transfers...Imports.Debits...South.Korea...nsa.................." ## ## [[21]] ## [1] "BoP..Current.Account..Goods...Services..Exports..South.Korea............" Let’s just look at exports. koreanpb <- pbcsv[grepl("Korea", names(pbcsv))] exportspb <- koreanpb[grepl("Exports", names(koreanpb))] names(exportspb) ## [1] "Trade.in.Services.replaces.1.A.B....Exports.Credits...South.Korea...nsa." ## [2] "Current.Transfers...Exports.Credits...South.Korea...nsa................." ## [3] "BoP..Current.Account..Goods...Services..Exports..South.Korea............" The last column gives exports of Goods and Services so let’s draw a chart of it. pb <- data.frame(pbcsv[grepl("Title", names(pbcsv))], exportspb[3]) colnames(pb) <- c("Title", "Exports") startpbY <- which(grepl("1999",pb$Title))
endpbY <- which(grepl("2015",pb$Title)) pbY <- pb[startpbY:endpbY,] tabpbY <- data.frame(kr=as.numeric(pbY$Exports),
krLabs=as.numeric(pbY$Title)) ggplot(tabpbY, aes(x=tabpbY$krLabs, y=tabpbY$kr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods and Services Exports UK / South Korea (Annual)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)") No joy here either to any of the claims. Still it’s been an interesting exercise. # Girsanov’s Theorem # Introduction We previously used importance sampling in the case where we did not have a sampler available for the distribution from which we wished to sample. There is an even more compelling case for using importance sampling. Suppose we wish to estimate the probability of a rare event. For example, suppose we wish to estimate $\mathbb{P}(X > 5)$ where $X \sim {\mathcal{N}}(0,1)$. In this case, we can look up the answer $\mathbb{P}(X > 5) \approx 2.86710^{-7}$. But suppose we couldn’t look up the answer. One strategy that might occur to us is to sample and then estimate the probability by counting the number of times out of the total that the sample was bigger than 5. The flaw in this is obvious but let’s try it anyway. > module Girsanov where  > import qualified Data.Vector as V > import Data.Random.Source.PureMT > import Data.Random > import Control.Monad.State > import Data.Histogram.Fill > import Data.Histogram.Generic ( Histogram ) > import Data.Number.Erf > import Data.List ( transpose )  > samples :: (Foldable f, MonadRandom m) => > (Int -> RVar Double -> RVar (f Double)) -> > Int -> > m (f Double) > samples repM n = sample$ repM n $stdNormal  > biggerThan5 :: Int > biggerThan5 = length (evalState xs (pureMT 42)) > where > xs :: MonadRandom m => m [Double] > xs = liftM (filter (>= 5.0))$ samples replicateM 100000


As we might have expected, even if we draw 100,000 samples, we estimate this probability quite poorly.

ghci> biggerThan5
0


Using importance sampling we can do a lot better.

Let $\xi$ be a normally distributed random variable with zero mean and unit variance under the Lebesgue measure $\mathbb{P}$. As usual we can then define a new probability measure, the law of $\xi$, by

\displaystyle \begin{aligned} \mathbb{P}_\xi((-\infty, b]) &= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^b e^{-x^2/2}\,\mathrm{d}x \end{aligned}

Thus

\displaystyle \begin{aligned} \mathbb{E}_\xi(f) &= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^\infty f(x) e^{-x^2/2}\,\mathrm{d}x \\ &= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^\infty f(x) e^{a^2/2}e^{-a x}e^{-(x-a)^2/2}\,\mathrm{d}x \\ &= \mathbb{E}_{\xi + a}(fg) \\ &= \mathbb{\tilde{E}}_{\xi + a}(f) \end{aligned}

where we have defined

$\displaystyle g(x) \triangleq e^{a^2/2}e^{-a x} \quad \mathrm{and} \quad \mathbb{\tilde{P}}((-\infty, b]) \triangleq \int_{-\infty}^b g(x)\,\mathrm{d}x$

Thus we can estimate $\mathbb{P}(X > 5)$ either by sampling from a normal distribution with mean 0 and counting the number of samples that are above 5 or we can sample from a normal distribution with mean 5 and calculating the appropriately weighted mean

$\displaystyle \frac{1}{n}\sum_{i=1}^n \mathbb{I}_{\{x > 5\}}g(y)$

Let’s try this out.

> biggerThan5' :: Double
> biggerThan5' = sum (evalState xs (pureMT 42)) / (fromIntegral n)
>   where
>     xs :: MonadRandom m => m [Double]
>     xs = liftM (map g) $> liftM (filter (>= 5.0))$
>          liftM (map (+5)) $> samples replicateM n > g x = exp$ (5^2 / 2) - 5 * x
>     n = 100000


And now we get quite a good estimate.

ghci> biggerThan5'
2.85776225450217e-7


## Random Paths

The probability of another rare event we might wish to estimate is that of Brownian Motion crossing a boundary. For example, what is the probability of Browian Motion crossing the line $y = 3.5$? Let’s try sampling 100 paths (we restrict the number so the chart is still readable).

> epsilons :: (Foldable f, MonadRandom m) =>
>                     (Int -> RVar Double -> RVar (f Double)) ->
>                     Double ->
>                     Int ->
>                     m (f Double)
> epsilons repM deltaT n = sample $repM n$ rvar (Normal 0.0 (sqrt deltaT))

> bM0to1 :: Foldable f =>
>           ((Double -> Double -> Double) -> Double -> f Double -> f Double)
>           -> (Int -> RVar Double -> RVar (f Double))
>           -> Int
>           -> Int
>           -> f Double
> bM0to1 scan repM seed n =
>   scan (+) 0.0 $> evalState (epsilons repM (recip$ fromIntegral n) n) (pureMT (fromIntegral seed))


We can see by eye in the chart below that again we do quite poorly.

We know that $\mathbb{P}(T_a \leq t) = 2(1 - \Phi (a / \sqrt{t}))$ where $T_a = \inf \{t : W_t = a\}$.

> p :: Double -> Double -> Double
> p a t = 2 * (1 - normcdf (a / sqrt t))

ghci> p 1.0 1.0
0.31731050786291415

ghci> p 2.0 1.0
4.550026389635842e-2

ghci> p 3.0 1.0
2.699796063260207e-3


But what if we didn’t know this formula? Define

$\displaystyle N(\omega) \triangleq \begin{cases} 1 & \text{if } \sup_{0 \leq t \leq 1}\tilde W_t \geq a \\ 0 & \text{if } \sup_{0 \leq t \leq 1}\tilde W_t < a \\ \end{cases}$

where $\mathbb{Q}$ is the measure which makes $\tilde W_t$ Brownian Motion.

We can estimate the expectation of $N$

$\displaystyle \hat p_{\mathbb{Q}} = \frac{1}{M}\sum_{i=1}^H n_i$

where $n_i$ is 1 if Brownian Motion hits the barrier and 0 otherwise and M is the total number of simulations. We know from visual inspection that this gives poor results but let us try some calculations anyway.

> n = 500
> m = 10000

> supAbove :: Double -> Double
> supAbove a = fromIntegral count / fromIntegral n
>   where
>     count = length $> filter (>= a)$
>             map (\seed -> maximum $bM0to1 scanl replicateM seed m) [0..n - 1]  > bM0to1WithDrift seed mu n = > zipWith (\m x -> x + mu * m * deltaT) [0..]$
>   bM0to1 scanl replicateM seed n
>     where
>       deltaT = recip fromIntegral n  ghci> supAbove 1.0 0.326 ghci> supAbove 2.0 7.0e-2 ghci> supAbove 3.0 0.0  As expected for a rare event we get an estimate of 0. Fortunately we can use importance sampling for paths. If we take $\mu(\omega, t) = a$ where $a$ is a constant in Girsanov’s Theorem below then we know that $\tilde W_t = W_t + \int_0^t a \,\mathrm{d}s = W_t + at$ is $\mathbb{Q}$-Brownian Motion. We observe that \displaystyle \begin{aligned} \mathbb{Q}N &= \mathbb{P}\bigg(N\frac{\mathrm{d} \mathbb{Q}}{\mathrm{d} \mathbb{P}}\bigg) \\ &= \mathbb{P}\Bigg[N \exp \Bigg(-\int_0^1 \mu(\omega,t) \,\mathrm{d}W_t - \frac{1}{2} \int_0^1 \mu^2(\omega, t) \,\mathrm{d} t\Bigg) \Bigg] \\ &= \mathbb{P}\Bigg[N \exp \Bigg(-aW_1 - \frac{1}{2} a^2\Bigg) \Bigg] \end{aligned} So we can also estimate the expectation of $N$ under $\mathbb{P}$ as $\displaystyle \hat p_{\mathbb{P}} = \frac{1}{M}\sum_{i=1}^H n_i\exp{\bigg(-aw^{(1)}_i - \frac{a^2}{2}\bigg)}$ where $n_i$ is now 1 if Brownian Motion with the specified drift hits the barrier and 0 otherwise, and $w^{(1)}_i$ is Brownian Motion sampled at $t=1$. We can see from the chart below that this is going to be better at hitting the required barrier. Let’s do some calculations. > supAbove' a = (sum zipWith (*) ns ws) / fromIntegral n
>   where
>     deltaT = recip $fromIntegral m > > uss = map (\seed -> bM0to1 scanl replicateM seed m) [0..n - 1] > ys = map last uss > ws = map (\x -> exp (-a * x - 0.5 * a^2)) ys > > vss = map (zipWith (\m x -> x + a * m * deltaT) [0..]) uss > sups = map maximum vss > ns = map fromIntegral$ map fromEnum map (>=a) sups  ghci> supAbove' 1.0 0.31592655955519156 ghci> supAbove' 2.0 4.999395029856741e-2 ghci> supAbove' 3.0 2.3859203473651654e-3  The reader is invited to try the above estimates with 1,000 samples per path to see that even with this respectable number, the calculation goes awry. ## In General If we have a probability space $(\Omega, {\mathcal{F}}, \mathbb{P})$ and a non-negative random variable $Z$ with $\mathbb{E}Z = 1$ then we can define a new probability measure $\mathbb{Q}$ on the same $\sigma$-algebra by $\displaystyle \mathbb{Q} A \triangleq \int_A Z \,\mathrm{d} \mathbb{P}$ For any two probability measures when such a $Z$ exists, it is called the Radon-Nikodym derivative of $\mathbb{Q}$ with respect to $\mathbb{P}$ and denoted $\frac{\mathrm{d} \mathbb{Q}}{\mathrm{d} \mathbb{P}}$ Given that we managed to shift a Normal Distribution with non-zero mean in one measure to a Normal Distribution with another mean in another measure by producing the Radon-Nikodym derivative, might it be possible to shift, Brownian Motion with a drift under a one probability measure to be pure Brownian Motion under another probability measure by producing the Radon-Nikodym derivative? The answer is yes as Girsanov’s theorem below shows. # Girsanov’s Theorem Let $W_t$ be Brownian Motion on a probability space $(\Omega, {\mathcal{F}}, \mathbb{P})$ and let $\{{\mathcal{F}}_t\}_{t \in [0,T]}$ be a filtration for this Brownian Motion and let $\mu(\omega, t)$ be an adapted process such that the Novikov Sufficiency Condition holds $\displaystyle \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^T \mu^2(s, \omega) \,\mathrm{d}s\bigg)}\bigg] = K < \infty$ then there exists a probability measure $\mathbb{Q}$ such that • $\mathbb{Q}$ is equivalent to $\mathbb{P}$, that is, $\mathbb{Q}(A) = 0 \iff \mathbb{P}(A) = 0$. • $\displaystyle {\frac{\mathrm{d}\mathbb{Q}}{\mathrm{d}\mathbb{P}} = \exp \Bigg(-\int_0^T \mu(\omega,t) \,\mathrm{d}W_t - \frac{1}{2} \int_0^T \mu^2(\omega, t) \,\mathrm{d} t\Bigg)}$. • $\tilde W_t = W_t + \int_0^t \mu(\omega, t) \,\mathrm{d}s$ is Brownian Motion on the probabiity space $(\Omega, {\mathcal{F}}, \mathbb{Q})$ also with the filtration $\{\mathcal{F}_t\}_{t \in [0,T]}$. In order to prove Girsanov’s Theorem, we need a condition which allows to infer that $M_t(\mu)$ is a strict martingale. One such useful condition to which we have already alluded is the Novikov Sufficiency Condition. ## Proof Define $\mathbb{Q}$ by $\displaystyle \mathbb{Q}(A) = \mathbb{P}(1_A M_T) \quad \mathrm{where} \quad M_t(\mu) = \exp{\bigg(\int_0^t - \mu(t, \omega) \,\mathrm{d}W_s - \frac{1}{2}\int_0^t \mu^2(t, \omega) \,\mathrm{d}s\bigg)}$ Then, temporarily overloading the notation and writing $\mathbb{P}$ for expectation under $\mathbb{P}$, and applying the Novikov Sufficiency Condition to $f(s) - \mu(\omega ,s)$, we have \displaystyle \begin{aligned} \mathbb{Q}\bigg[\exp{\int_0^T f(s) \,\mathrm{d}X_s}\bigg] &= \mathbb{Q}\bigg[\exp{\int_0^T f(s) \,\mathrm{d}W_s + \int_0^T \mu(\omega, s) \,\mathrm{d}s}\bigg] \\ &= \mathbb{P}\bigg[\exp{\bigg( \int_0^T \big(f(s) - \mu(\omega, s)\big)\,\mathrm{d}W_s + \int_0^T f(s)\mu(\omega, s)\,\mathrm{d}s - \frac{1}{2}\int_0^T \mu^2(\omega ,s) \,\mathrm{d}s \bigg)}\bigg] \\ &= \mathbb{P}\bigg[\exp{\bigg( \int_0^T \big(f(s) - \mu(\omega, s)\big)\,\mathrm{d}W_s - \frac{1}{2}\int_0^T \big(f(s) - \mu(\omega ,s)\big)^2 \,\mathrm{d}s + \frac{1}{2}\int_0^T f^2(s) \,\mathrm{d}s \bigg)}\bigg] \\ &= \frac{1}{2}\int_0^T f^2(s) \,\mathrm{d}s \, \mathbb{P}\bigg[\exp{\bigg( \int_0^T \big(f(s) - \mu(\omega, s)\big)\,\mathrm{d}W_s - \frac{1}{2}\int_0^T \big(f(s) - \mu(\omega ,s)\big)^2 \,\mathrm{d}s \bigg)}\bigg] \\ &= \frac{1}{2}\int_0^T f^2(s) \,\mathrm{d}s \end{aligned} From whence we see that $\displaystyle \mathbb{Q}\big(e^{i \zeta (X_t - X_s)}\big) = e^{-\frac{1}{2} \zeta^2 (t - s)}$ And since this characterizes Brownian Motion, we are done. $\blacksquare$ # The Novikov Sufficiency Condition ## The Novikov Sufficiency Condition Statement Let $\mu \in {\cal{L}}^2_{\mathrm{LOC}}[0,T]$ and further let it satisfy the Novikov condition $\displaystyle \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^T \mu^2(s, \omega) \,\mathrm{d}s\bigg)}\bigg] = K < \infty$ then the process defined by $\displaystyle M_t(\mu) = \exp{\bigg(\int_0^t \mu(t, \omega) \,\mathrm{d}W_s - \frac{1}{2}\int_0^t \mu^2(t, \omega) \,\mathrm{d}s\bigg)}$ is a strict martingale. Before we prove this, we need two lemmas. ## Lemma 1 Let $M_t$ for $t \in [0,t]$ be a non-negative local martingale then $M_t$ is a super-martingale and if further $\mathbb{E}M_T = \mathbb{E}M_0$ then $M_t$ is a strict martingale. Proof Let $\{\tau_n\}_{n \in \mathbb{N}}$ be a localizing sequence for $M_t$ then for $0 < s < t < T$ and using Fatou’s lemma and the fact that the stopped process is a strict martingale $\displaystyle \mathbb{E}(M_t \,|\, {\mathcal{F}_s}) = \mathbb{E}(\liminf_{n \rightarrow \infty} M_{t \land \tau_m} \,|\, {\mathcal{F}_s}) \leq \liminf_{n \rightarrow \infty} \mathbb{E}(M_{t \land \tau_m} \,|\, {\mathcal{F}_s}) = \liminf_{n \rightarrow \infty} M_{s \land \tau_m} = M_s$ Thus $M_t$ is a super-martingale and therefore $\displaystyle \mathbb{E}M_T \leq \mathbb{E}M_t \leq \mathbb{E}M_s \leq \mathbb{E}M_0$ By assumption we have $\mathbb{E}M_T \leq \mathbb{E}M_0$ thus $M_t$ is a strict martingale. $\blacksquare$ ## Lemma 2 Let $M_t$ be a non-negative local martingale. If $\{\tau_n\}_{n \in \mathbb{N}}$ is a localizing sequence such that $\sup_n \|M_{T \land \tau_n}\|_p < \infty$ for some $p > 1$ then $M_t$ is a strict martingale. Proof $\displaystyle \mathbb{E}(|M_T - M_{T \land \tau_n}|) \leq \mathbb{E}(|M_T - r \land M_T) + \mathbb{E}(|r \land M_T - r \land M_{T \land \tau_n}|) + \mathbb{E}(M_{T \land \tau_n} - r \land M_{T \land \tau_n})$ By the super-martingale property $\mathbb{E}(M_T) \leq \mathbb{E}(M_0) < \infty$ and thus by dominated convergence we have that $\displaystyle \lim_{r \rightarrow \infty} \mathbb{E}(r \land M_T) = \mathbb{E}(M_T) \quad \mathrm{and} \quad \lim_{r \rightarrow \infty}\lim_{n \rightarrow \infty}\mathbb{E}(|r \land M_T - r \land M_{T \land \tau_n}|) = 0$ We also have that \displaystyle \begin{aligned} \mathbb{E}(M_{T \land \tau_n} - r \land M_{T \land \tau_n}) &= \mathbb{E}((M_{T \land \tau_n} - r \land M_{T \land \tau_n}){I}_{(M_{T \land \tau_n} > r)}) + \mathbb{E}((M_{T \land \tau_n} - r \land M_{T \land \tau_n}){I}_{(M_{T \land \tau_n} \leq r)}) \\ &= \mathbb{E}((M_{T \land \tau_n} - r \land M_{T \land \tau_n}){I}_{(M_{T \land \tau_n} > r)}) \\ &= \mathbb{E}(M_{T \land \tau_n}{I}_{(M_{T \land \tau_n} > r)}) - r\mathbb{P}({M_{T \land \tau_n} > r}) \end{aligned} By Chebyshev’s inequality (see note (2) below), we have $\displaystyle r\mathbb{P}({M_{T \land \tau_n} > r}) \leq \frac{r\mathbb{E}|X|^p}{r^p} \leq \frac{\sup_n{\mathbb{E}(M_{T \land \tau_n})^p}}{r^{p-1}}$ Taking limits first over $n \rightarrow \infty$ and then over $r \rightarrow \infty$ we see that $\displaystyle \lim_{r \rightarrow \infty}\lim_{n \rightarrow \infty} r\mathbb{P}({M_{T \land \tau_n} > r}) \rightarrow 0$ For $0 \leq r \leq x$ and $p > 1$ we have $x \leq r^{1-p}x^p$. Thus $\displaystyle \mathbb{E}(M_{T \land \tau_n}{I}_{(M_{T \land \tau_n} > r)}) \leq r^{1-p}\mathbb{E}(M_{T \land \tau_n}^p{I}_{(M_{T \land \tau_n} > r)}) \leq r^{1-p}\sup_n(M_{T \land \tau_n}^p)$ Again taking limits over $n \rightarrow \infty$ and then over $r \rightarrow \infty$ we have $\displaystyle \lim_{r \rightarrow \infty}\lim_{n \rightarrow \infty} \mathbb{E}(M_{T \land \tau_n}{I}_{(M_{T \land \tau_n} > r)}) \rightarrow 0$ These two conclusions imply $\displaystyle \lim_{r \rightarrow \infty}\lim_{n \rightarrow \infty} \mathbb{E}(M_{T \land \tau_n} - r \land M_{T \land \tau_n}) \rightarrow 0$ We can therefore conclude (since $M_{T \land \tau_n}$ is a martingale) $\displaystyle \mathbb{E}(M_T) = \lim_{n \rightarrow \infty}\mathbb{E}(M_{T \land \tau_n}) = \mathbb{E}(M_0)$ Thus by the preceeding lemma $M_t$ is a strict as well as a local martingale. $\blacksquare$ ## The Novikov Sufficiency Condition Proof ### Step 1 First we note that $M_t(\lambda\mu)$ is a local martingale for $0 < \lambda < 1$. Let us show that it is a strict martingale. We can do this if for any localizing sequence $\{\tau_n\}_{n \in \mathbb{N}}$ we can show $\displaystyle \sup_n\mathbb{E}(M_{T \land \tau_n}(\lambda\mu))^p < \infty$ using the preceeding lemma where $p > 1$. We note that \displaystyle \begin{aligned} M_t(\lambda\mu) &= \exp{\bigg(\int^t_0 \lambda\mu(\omega, s)\,\mathrm{d}W_s - \frac{1}{2}\int^t_0 \lambda^2\mu^2(\omega, s)\,\mathrm{d}s\bigg)} \\ &= {(M_t(\mu))}^{\lambda^2}\exp{\bigg((\lambda - \lambda^2)\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)} \end{aligned} Now apply Hölder’s inequality with conjugates $({p\lambda^2})^{-1}$ and $({1 - p\lambda^2})^{-1}$ where $p$ is chosen to ensure that the conjugates are both strictly greater than 1 (otherwise we cannot apply the inequality). \displaystyle \begin{aligned} \mathbb{E}((M_t(\lambda\mu))^p) &= \mathbb{E}\bigg[{(M_t(\mu))}^{p\lambda^2}\exp{\bigg(p(\lambda - \lambda^2)\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg] \\ &\le \bigg|\bigg|{M_t(\mu)}^{p\lambda^2}\bigg|\bigg|_{p\lambda^2} \bigg|\bigg|\exp{\bigg(p(\lambda - \lambda^2)\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg|\bigg|_{1 - p\lambda^2} \\ &= \mathbb{E}{\bigg[M_t(\mu)}\bigg]^{p\lambda^2} \mathbb{E}\bigg[\exp{\bigg(p\frac{\lambda - \lambda^2}{1 - p\lambda^2}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg]^{1 - p\lambda^2} \end{aligned} Now let us choose $\displaystyle p\frac{\lambda - \lambda^2}{1 - p\lambda^2} = \frac{1}{2}$ then \displaystyle \begin{aligned} 2p(\lambda - \lambda^2) &= 1 - p\lambda^2 \\ p & = \frac{1}{2(\lambda - \lambda^2) + \lambda^2} \\ p &= \frac{1}{(2 - \lambda)\lambda} \end{aligned} In order to apply Hölder’s inequality we need to check that $(p\lambda^2)^{-1} > 1$ and that $(1 - p\lambda^2)^{-1} > 1$ but this amounts to checking that $p\lambda^2 > 0$ and that $1 > \lambda$. We also need to check that $p > 0$ but this amounts to checking that $(2 - \lambda)\lambda < 1$ for $0 < \lambda < 1$ and this is easily checked to be true. Re-writing the above inequality with this value of $p$ we have \displaystyle \begin{aligned} \mathbb{E}((M_t(\lambda\mu))^p) &\le \mathbb{E}{\bigg[M_t(\mu)}\bigg]^{p\lambda^2} \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg]^{1 - p\lambda^2} \end{aligned} By the first lemma, since $M_t(\mu)$ is a non-negative local martingale, it is also a supermartingale. Furthermore $\mathbb{E}(M_0(\mu)) = 1$. Thus $\displaystyle \mathbb{E}{\bigg[M_t(\mu)}\bigg]^{p\lambda^2} \leq 1$ and therefore \displaystyle \begin{aligned} \mathbb{E}((M_t(\lambda\mu))^p) &\le \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg]^{1 - p\lambda^2} \end{aligned} ### Step 2 Recall we have $\displaystyle {M_t} = \exp\bigg( \int_0^t \mu(\omega ,s)\,\mathrm{d}W_s - \frac{1}{2}\int_0^t \mu(\omega ,s)\,\mathrm{d}s \bigg)$ Taking logs gives $\displaystyle \log{M_t} = \int_0^t \mu(\omega ,s)\,\mathrm{d}W_s - \frac{1}{2}\int_0^t \mu(\omega ,s)^2\,\mathrm{d}s$ or in diferential form $\displaystyle \mathrm{d}(\log{M_t}) = \mu(\omega ,t)\,\mathrm{d}W_t - \frac{1}{2}\mu(\omega ,t)^2\,\mathrm{d}t$ We can also apply Itô’s rule to $\log{M_t}$ \displaystyle \begin{aligned} \mathrm{d}(\log{M_t}) &= \frac{1}{M_t}\,\mathrm{d}M_t - \frac{1}{2}\frac{1}{M_t^2}\,\mathrm{d}[M]_t \\ \end{aligned} where $[\ldots]$ denotes the quadratic variation of a stochastic process. Comparing terms gives the stochastic differential equation $\displaystyle \mathrm{d}M_t = M_t\mu(\omega,t)\,\mathrm{d}W_t$ In integral form this can also be written as $\displaystyle M_t = 1 + \int_0^t M_s\mu(\omega, s)\,\mathrm{d}W_s$ Thus $M_t$ is a local martingale (it is defined by a stochastic differential equation) and by the first lemma it is a supermaringale. Hence $\mathbb{E}M_t \leq 1$. Next we note that $\displaystyle \exp{\bigg(\frac{1}{2}\int_0^t \mu(\omega, t)\bigg)} = \exp{\bigg(\frac{1}{2}\int_0^t \mu(\omega, t) - \frac{1}{4}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s\bigg)} \exp{\bigg(\frac{1}{4}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s\bigg)}$ to which we can apply Hölder’s inequality with conjugates $p = q = 2$ to obtain \displaystyle \begin{aligned} \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^t \mu(\omega, t)\bigg)}\bigg] &= \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^t \mu(\omega, t) - \frac{1}{4}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s \bigg)} \exp{\bigg(\frac{1}{4}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s \bigg)}\bigg] \\ & \leq \sqrt{\mathbb{E}\bigg[\exp{\bigg(\int_0^t \mu(\omega, t) - \frac{1}{2}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s \bigg)}\bigg]} \sqrt{\mathbb{E}\exp{\bigg(\frac{1}{2}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s \bigg)}\bigg]} \end{aligned} Applying the supermartingale inequality then gives \displaystyle \begin{aligned} \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^t \mu(\omega, t)\bigg)}\bigg] & \leq \sqrt{\mathbb{E}\exp{\bigg(\frac{1}{2}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s \bigg)}\bigg]} \end{aligned} ### Step 3 Now we can apply the result in Step 2 to the result in Step 1. \displaystyle \begin{aligned} \mathbb{E}((M_t(\lambda\mu))^p) &\le \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg]^{1 - p\lambda^2} \\ &\le {\mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s \bigg)}\bigg]}^{(1 - p\lambda^2)/2} \\ &\le K^{(1 - p\lambda^2)/2} \end{aligned} We can replace $M_t$ by $M_t {\mathcal{I}}_{t < \tau}$ for any stopping time $\tau$. Thus for a localizing sequence we have \displaystyle \begin{aligned} \mathbb{E}((M_{t \land \tau_n}(\lambda\mu))^p) &\le K^{(1 - p\lambda^2)/2} \end{aligned} From which we can conclude $\displaystyle \sup_n \|M_{T \land \tau_n}(\lambda\mu)\|_p < \infty$ Now we can apply the second lemma to conclude that $M_{T \land \tau_n}(\lambda\mu)$ is a strict martingale. ### Final Step We have already calculated that \displaystyle \begin{aligned} M_t(\lambda\mu) &= \exp{\bigg(\int^t_0 \lambda\mu(\omega, s)\,\mathrm{d}W_s - \frac{1}{2}\int^t_0 \lambda^2\mu^2(\omega, s)\,\mathrm{d}s\bigg)} \\ &= {(M_t(\mu))}^{\lambda^2}\exp{\bigg((\lambda - \lambda^2)\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)} \end{aligned} Now apply Hölder’s inequality with conjugates $p = \lambda^{-2}$ and $q = (1 - \lambda^2)^{-1}$. $\displaystyle 1 = \mathbb{E}(M_t(\lambda\mu) \le \mathbb{E}(M_t(\mu))^{\lambda^2}\mathbb{E}{\bigg(}\exp{\bigg(\frac{\lambda}{1 + \lambda}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg)^{1 - \lambda^2}$ And then we can apply Jensen’s inequality to the last term on the right hand side with the convex function $x^{(1 + \lambda)/2\lambda}$. $\displaystyle 1 \le \mathbb{E}(M_t(\mu))^{\lambda^2} \mathbb{E}{\bigg(}\exp{\bigg(\frac{1}{2}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg)^{2\lambda(1- \lambda)}$ Using the inequality we established in Step 2 and the Novikov condition then gives $\displaystyle 1 \le \mathbb{E}(M_t(\mu))^{\lambda^2} K^{\lambda(1 - \lambda)}$ If we now let $\lambda \nearrow 1$ we see that we must have $1 \le \mathbb{E}(M_t(\mu))$. We already now that $1 \ge \mathbb{E}(M_t(\mu))$ by the first lemma and so we have finally proved that $M_t(\mu)$ is a martingale. # Notes 1. We have already used importance sampling and also touched on changes of measure. 2. Chebyshev’s inequality is usually stated for the second moment but the proof is easily adapted: $\displaystyle \mathbb P( |X| > u ) = \int 1_{|X| > u} ~d\mathbb P = \frac 1 {u^p} \int u^p 1_{|X| > u} ~d\mathbb P < \frac 1 {u^p} \int |X|^p 1_{|X| > u} ~ d\mathbb P \le \frac 1 {u^p} \mathbb E|X|^p.$ 1. We follow Handel (2007); a similar approach is given in Steele (2001). # Bibliography Handel, Ramon von. 2007. “Stochastic Calculus, Filtering, and Stochastic Control (Lecture Notes).” Steele, J.M. 2001. Stochastic Calculus and Financial Applications. Applications of Mathematics. Springer New York. https://books.google.co.uk/books?id=fsgkBAAAQBAJ. # Conditional Expectation under Change of Measure ## Theorem Let $\mathbb{P}$ and $\mathbb{Q}$ be measures on $(\Omega, {\mathcal{F}})$ with $\mathbb{Q} \ll \mathbb{P}$, ${\mathcal{G}} \subset {\mathcal{F}}$ a sub $\sigma$-algebra and $X$ an integrable random variable ($\mathbb{P}\lvert{X}\rvert < \infty$) then $\displaystyle \mathbb{P}(X\vert {\mathcal{G}}) = \frac {\mathbb{Q}\bigg(X\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\bigg\vert {\mathcal{G}}\bigg)} {\mathbb{Q}\bigg(\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\bigg\vert {\mathcal{G}}\bigg)}$ ## Proof \displaystyle \begin{aligned} \mathbb{Q}\bigg(\mathbb{I}_A \mathbb{Q}\bigg(X \frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{P}}\bigg\vert {\mathcal{G}}\bigg)\bigg) &= \mathbb{Q}\bigg(\mathbb{I}_A X \frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{P}}\bigg) \\ &= \mathbb{P}\big(\mathbb{I}_A X \big) \\ &= \mathbb{P}\big(\mathbb{I}_A \mathbb{P}(X \vert {\mathcal{G}})\big) \\ &= \mathbb{Q}\bigg(\mathbb{I}_A \frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\mathbb{P}(X \vert {\mathcal{G}})\bigg) \\ &= \mathbb{Q}\bigg(\mathbb{I}_A \mathbb{Q}\bigg(\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\bigg\vert {\mathcal{G}}\bigg)\mathbb{P}(X \vert {\mathcal{G}})\bigg) \\ \end{aligned} Thus $\displaystyle \mathbb{Q}\bigg(\mathbb{I}_A\mathbb{P}(X\vert {\mathcal{G}}){\mathbb{Q}\bigg(\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\bigg\vert {\mathcal{G}}\bigg)}\bigg) = \mathbb{Q}\bigg(\mathbb{I}_A \mathbb{Q}\bigg(X\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\bigg\vert {\mathcal{G}}\bigg)\bigg)\quad \mathrm{for\,all}\, A \in {\mathcal{G}}$ Hence $\displaystyle \mathbb{P}(X\vert {\mathcal{G}}){\mathbb{Q}\bigg(\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\bigg\vert {\mathcal{G}}\bigg)} = \mathbb{Q}\bigg(X\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\bigg\vert {\mathcal{G}}\bigg)\quad {\mathbb{Q}-\mathrm{a.s.}}$ We note that $\displaystyle A = \bigg\{\omega \,\bigg\vert\, \mathbb{Q}\bigg(\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\bigg)\bigg\}$ is ${\mathcal{G}}$-measurable (it is the result of a projection) and that $\displaystyle 0 = \mathbb{Q}\bigg(\mathbb{I}_A\mathbb{Q}\bigg( \frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}} \bigg\vert {\mathcal{G}}\bigg)\bigg) = \mathbb{Q}\bigg(\mathbb{I}_A \frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}} \bigg) = \mathbb{P}(\mathbb{I}_A)$ Hence $\displaystyle \mathbb{P}(X\vert {\mathcal{G}}) = \frac {\mathbb{Q}\bigg(X\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\bigg\vert {\mathcal{G}}\bigg)} {\mathbb{Q}\bigg(\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}}\bigg\vert {\mathcal{G}}\bigg)}\quad {\mathbb{P}-\mathrm{a.s.}}$ as required. # Some Background on Hidden Markov Models # Introduction If you look at the wikipedia article on Hidden Markov Models (HMMs) then you might be forgiven for concluding that these deal only with discrete time and finite state spaces. In fact, HMMs are much more general. Furthermore, a better understanding of such models can be helped by putting them into context. Before actually specifying what an HMM is, let us review something of Markov processes. A subsequent blog post will cover HMMs themselves. # Markov Process and Chains Recall that a transition kernel is a mapping $\mu : X \times {\cal{Y}} \rightarrow \overline{\mathbb{R}}_{+}$ where $(X, {\cal{X}})$ and $(Y, {\cal{Y}})$ are two measurable spaces such that $\mu(s, \cdot)$ is a probability measure on ${\cal{Y}}$ for all $s \in X$ and such that $\mu(\cdot, A)$ is a measurable function on $X$ for all $A \in {\cal{Y}}$. For example, we could have $X = Y = \{a, b\}$ and ${\cal{X}} = {\cal{Y}} = \{\emptyset, \{a\}, \{b\}, \{a,b\}\}$ and $\mu(a,\{a\}) = 0.4, \mu(a,\{b\}) = 0.6, \mu(b,\{a\}) = 0.6, \mu(b,\{b\}) = 0.4$. Hopefully this should remind you of the transition matrix of a Markov chain. Recall further that a family of such transitions $\{\mu_t\}_{t \in T}$ where $T$ is some index set satisfying $\displaystyle \mu_{t+s}(x, A) = \int_{Y} \mu_s(x, {\mathrm{d}y})\mu_t(y, A)$ gives rise to a Markov process (under some mild conditions — see Rogers and Williams (2000) and Kallenberg (2002) for much more detail), that is, a process in which what happens next only depends on where the process is now and not how it got there. Let us carry on with our example and take $T = \mathbb{N}$. With a slight abuse of notation and since $Y$ is finite we can re-write the integral as a sum $\displaystyle \mu_{n+m}(x, z) = \sum_{y \in Y} \mu_m(x, y)\mu_n(y, z)$ which we recognise as a restatement of how Markov transition matrices combine. # Some Examples ## A Fully Deterministic System A deterministic system can be formulated as a Markov process with a particularly simple transition kernel given by $\displaystyle \mu_t(x_s, A) = \delta(f_t(x_s), A) \triangleq \begin{cases} 1 & \text{if } f_t(x_s) \in A \\ 0 & \text{if } f_t(x_s) \notin A \end{cases}$ where $f_t$ is the deterministic state update function (the flow) and $\delta$ is the Dirac delta function. ## Parameters Let us suppose that the determinstic system is dependent on some time-varying values for which we we are unable or unwish to specify a deterministic model. For example, we may be considering predator-prey model where the parameters cannot explain every aspect. We could augment the deterministic kernel in the previous example with $\displaystyle \mu_t(\theta_s, {\mathrm{d}\phi}) = {{\cal{N}} \left( {\mathrm{d}\phi} \,\vert\, \theta_s, \tau^2(t-s) \right)}$ where we use Greek letters for the parameters (and Roman letters for state) and we use e.g. ${\mathrm{d}\phi}$ to indicate probability densities. In other words that the parameters tend to wiggle around like Brown’s pollen particles rather than remaining absolutely fixed. ## Ornstein-Uhlenbeck Of course Brownian motion or diffusion may not be a good model for our parameters; with Brownian motion, the parameters could drift off to $\pm\infty$. We might believe that our parameters tend to stay close to some given value (mean-reverting) and use the Ornstein-Uhlenbeck kernel. $\displaystyle \mu_t(\theta_s, {\mathrm{d}\phi}) = {{\cal{N}} \left( {\mathrm{d}\phi} \,\vert\, \alpha + (\theta_s - \alpha)e^{-\beta t},\frac{\sigma^2}{2\beta}\big(1 - e^{-2\beta t}\big) \right)}$ where $\beta$ expresses how strongly we expect the parameter to respond to perturbations, $\alpha$ is the mean to which the process wants to revert (aka the asymptotic mean) and $\sigma^2$ expresses how noisy the process is. It is sometimes easier to view these transition kernels in terms of stochastic differential equations. Brownian motion can be expressed as $\displaystyle \mathrm{d}X_t = \sigma\mathrm{d}W_t$ and Ornstein-Uhlenbeck can be expressed as $\displaystyle \mathrm{d}X_t = -\beta(X_t - \alpha)\mathrm{d}t + \sigma\mathrm{d}W_t$ where $W_t$ is the Wiener process. Let us check that the latter stochastic differential equation gives the stated kernel. Re-writing it in integral form and without loss of generality taking $s= 0$ $\displaystyle X_t = \alpha + (x_0 - \alpha)e^{-\beta t} + \sigma\int_0^t e^{-\beta(t - s)}\mathrm{d}W_s$ Since the integral is of a deterministic function, the distribution of $X_t$ is normal. Thus we need only calculate the mean and variance. The mean is straightforward. $\displaystyle \mathbb{E}[X_t \,\vert\, X_0 = x_0] = \mathbb{E}\Bigg[\alpha + (x_0 - \alpha)e^{-\beta t} + \sigma\int_0^t e^{-\beta(t - s)}\mathrm{d}W_s \,\vert\, X_0 = x_0\Bigg] = \alpha + (x_0 - \alpha)e^{-\beta t}$ Without loss of generality assume $t \leq u$ and writing $\mathbb{C}$ for covariance \displaystyle \begin{aligned} \mathbb{C}[X_u, X_t \,\vert\, X_0 = x_0] &= \mathbb{E}\Bigg[\Bigg( \sigma\int_0^u e^{-\beta(u - s)}\mathrm{d}W_s \Bigg) \Bigg( \sigma\int_0^t e^{-\beta(t - s)}\mathrm{d}W_s \Bigg)\Bigg] \\ &= \sigma^2e^{-\beta(u + t)} \mathbb{E}\Bigg[\Bigg(\int_0^u e^{\beta s}\mathrm{d}W_s\Bigg) \Bigg(\int_0^t e^{\beta s}\mathrm{d}W_s\Bigg)\Bigg] \\ &= \sigma^2e^{-\beta(u + t)} \mathbb{E}\Bigg[\Bigg(\int_0^t e^{\beta s}\mathrm{d}W_s + \int_t^u e^{\beta s}\mathrm{d}W_s\Bigg) \Bigg(\int_0^t e^{\beta s}\mathrm{d}W_s\Bigg)\Bigg] \end{aligned} And now we can use Ito and independence \displaystyle \begin{aligned} \mathbb{C}[X_u, X_t \,\vert\, X_0 = x_0] &= \sigma^2e^{-\beta(u + t)}\mathbb{E}\Bigg[ \int_0^t e^{2\beta s}\mathrm{d}s \Bigg] \\ &= \frac{\sigma^2e^{-\beta(u + t)}}{2\beta}\big(e^{2\beta t} - 1\big) \end{aligned} Substituting in $t = u$ gives the desired result. # Bibliography Kallenberg, O. 2002. Foundations of Modern Probability. Probability and Its Applications. Springer New York. http://books.google.co.uk/books?id=TBgFslMy8V4C. Rogers, L. C. G., and David Williams. 2000. Diffusions, Markov Processes, and Martingales. Vol. 1. Cambridge Mathematical Library. Cambridge: Cambridge University Press. # A Type Safe Reverse or Some Hasochism Conor McBride was not joking when he and his co-author entitled their paper about dependent typing in Haskell “Hasochism”: Lindley and McBride (2013). In trying to resurrect the Haskell package yarr, it seemed that a dependently typed reverse function needed to be written. Writing such a function turns out to be far from straightforward. How GHC determines that a proof (program) discharges a proposition (type signature) is rather opaque and perhaps not surprisingly the error messages one gets if the proof is incorrect are far from easy to interpret. I’d like to thank all the folk on StackOverflow whose answers and comments I have used freely below. Needless to say, any errors are entirely mine. Here are two implementations, each starting from different axioms (NB: I have never seen type families referred to as axioms but it seems helpful to think about them in this way). > {-# OPTIONS_GHC -Wall #-} > {-# OPTIONS_GHC -fno-warn-name-shadowing #-} > {-# OPTIONS_GHC -fno-warn-type-defaults #-} > {-# OPTIONS_GHC -fno-warn-unused-do-bind #-} > {-# OPTIONS_GHC -fno-warn-missing-methods #-} > {-# OPTIONS_GHC -fno-warn-orphans #-}  > {-# LANGUAGE GADTs #-} > {-# LANGUAGE KindSignatures #-} > {-# LANGUAGE DataKinds #-} > {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE UndecidableInstances #-} > {-# LANGUAGE ExplicitForAll #-} > {-# LANGUAGE TypeOperators #-} > {-# LANGUAGE ScopedTypeVariables #-}  For both implementations, we need propositional equality: if a :~: b is inhabited by some terminating value, then the type a is the same as the type b. Further we need the normal form of an equality proof: Refl :: a :~: a and a function, gcastWith which allows us to use internal equality (:~:) to discharge a required proof of external equality (~). Readers familiar with topos theory, for example see Lambek and Scott (1988), will note that the notation is reversed. > import Data.Type.Equality ( (:~:) (Refl), gcastWith )  For the second of the two approaches adumbrated we will need > import Data.Proxy  The usual natural numbers: > data Nat = Z | S Nat  We need some axioms: 1. A left unit and 2. Restricted commutativity. > type family (n :: Nat) :+ (m :: Nat) :: Nat where > Z :+ m = m > S n :+ m = n :+ S m  We need the usual singleton for Nat to tie types and terms together. > data SNat :: Nat -> * where > SZero :: SNat Z > SSucc :: SNat n -> SNat (S n)  Now we can prove some lemmas. First a lemma showing we can push :+ inside a successor, S. > succ_plus_id :: SNat n1 -> SNat n2 -> (((S n1) :+ n2) :~: (S (n1 :+ n2))) > succ_plus_id SZero _ = Refl > succ_plus_id (SSucc n) m = gcastWith (succ_plus_id n (SSucc m)) Refl  This looks nothing like a standard mathematical proof and it’s hard to know what ghc is doing under the covers but presumably something like this: • For SZero 1. S Z :+ n2 = Z :+ S n2 (by axiom 2) = S n2 (by axiom 1) and 2. S (Z + n2) = S n2 (by axiom 1) 3. So S Z :+ n2 = S (Z + n2) • For SSucc 1. SSucc n :: SNat (S k) so n :: SNat k and m :: SNat i so SSucc m :: SNat (S i) 2. succ_plus id n (SSucc m) :: k ~ S p => S p :+ S i :~: S (p :+ S i) (by hypothesis) 3. k ~ S p => S p :+ S i :~: S (S p :+ i) (by axiom 2) 4. k :+ S i :~: S (k :+ i) (by substitution) 5. S k :+ i :~: S (k :+ i) (by axiom 2) Second a lemma showing that Z is also the right unit. > plus_id_r :: SNat n -> ((n :+ Z) :~: n) > plus_id_r SZero = Refl > plus_id_r (SSucc n) = gcastWith (plus_id_r n) (succ_plus_id n SZero)  • For SZero 1. Z :+ Z = Z (by axiom 1) • For SSucc 1. SSucc n :: SNat (S k) so n :: SNat k 2. plus_id_r n :: k :+ Z :~: k (by hypothesis) 3. succ_plus_id n SZero :: S k :+ Z :~: S (k + Z) (by the succ_plus_id lemma) 4. succ_plus_id n SZero :: k :+ Z ~ k => S k :+ Z :~: S k (by substitution) 5. plus_id_r n :: k :+ Z :~: k (by equation 2) Now we can defined vectors which have their lengths encoded in their type. > infixr 4 :::  > data Vec a n where > Nil :: Vec a Z > (:::) :: a -> Vec a n -> Vec a (S n)  We can prove a simple result using the lemma that Z is a right unit. > elim0 :: SNat n -> (Vec a (n :+ Z) -> Vec a n) > elim0 n x = gcastWith (plus_id_r n) x  Armed with this we can write an ${\mathcal{O}}(n)$ reverse function in which the length of the result is guaranteed to be the same as the length of the argument. > size :: Vec a n -> SNat n > size Nil = SZero > size (_ ::: xs) = SSucc size xs

> accrev :: Vec a n -> Vec a n
> accrev x = elim0 (size x) $go Nil x where > go :: Vec a m -> Vec a n -> Vec a (n :+ m) > go acc Nil = acc > go acc (x ::: xs) = go (x ::: acc) xs  > toList :: Vec a n -> [a] > toList Nil = [] > toList (x ::: xs) = x : toList xs  > test0 :: [String] > test0 = toList$ accrev $"a" ::: "b" ::: "c" ::: Nil  ghci> test0 ["c","b","a"]  For an alternative approach, let us change the axioms slightly. > type family (n1 :: Nat) + (n2 :: Nat) :: Nat where > Z + n2 = n2 > (S n1) + n2 = S (n1 + n2)  Now the proof that Z is a right unit is more straightforward. > plus_id_r1 :: SNat n -> ((n + Z) :~: n) > plus_id_r1 SZero = Refl > plus_id_r1 (SSucc n) = gcastWith (plus_id_r1 n) Refl  For the lemma showing we can push + inside a successor, S, we can use a Proxy. > plus_succ_r1 :: SNat n1 -> Proxy n2 -> ((n1 + (S n2)) :~: (S (n1 + n2))) > plus_succ_r1 SZero _ = Refl > plus_succ_r1 (SSucc n1) proxy_n2 = gcastWith (plus_succ_r1 n1 proxy_n2) Refl  Now we can write our reverse function without having to calculate sizes. > accrev1 :: Vec a n -> Vec a n > accrev1 Nil = Nil > accrev1 list = go SZero Nil list > where > go :: SNat n1 -> Vec a n1 -> Vec a n2 -> Vec a (n1 + n2) > go snat acc Nil = gcastWith (plus_id_r1 snat) acc > go snat acc (h ::: (t :: Vec a n3)) = > gcastWith (plus_succ_r1 snat (Proxy :: Proxy n3)) > (go (SSucc snat) (h ::: acc) t)  > test1 :: [String] > test1 = toList$ accrev1 \$ "a" ::: "b" ::: "c" ::: Nil

ghci> test0
["c","b","a"]


## Bibliography

Lambek, J., and P.J. Scott. 1988. Introduction to Higher-Order Categorical Logic. Cambridge Studies in Advanced Mathematics. Cambridge University Press. http://books.google.co.uk/books?id=6PY_emBeGjUC.

Lindley, Sam, and Conor McBride. 2013. “Hasochism: The Pleasure and Pain of Dependently Typed Haskell Programming.” In Proceedings of the 2013 ACM SIGPLAN Symposium on Haskell, 81–92. Haskell ’13. New York, NY, USA: ACM. doi:10.1145/2503778.2503786.