This is a short example of how to use SUNDIALS to solve a simple partial differential equation in Haskell via the hmatrixsundials library. The example is taken from the C examples that come with the SUNDIALS source. Here’s the full blog. I’ll give a better URL soonish.
Uncategorized
Estimating Parameters in Chaotic Systems
Trouble with Tribbles
Introduction
Tribbles originate from the planet Iota Geminorum IV and, according to Dr. McCoy, are born pregnant. No further details are given but we can follow Gurtin and MacCamy (1974) and perhaps recover some of what happens on the Enterprise.
Of course, agedependent population models are of more than fictional use and can be applied, for example, to modelling the progression of Malaria in infected hosts. We roughly follow some of J. J. Thibodeaux and Schlittenhardt (2011) who themselves reference Belair, Mackey, and Mahaffy (1995).
Of interest to Haskellers are:

The use of the hmatrix package which now contains functions to solve tridiagonal systems used in this post. You will need to use HEAD until a new hackage / stackage release is made. My future plan is to use CUDA via accelerate and compare.

The use of dimensions in a mediumsized example. It would have been nice to have tried the units package but it seemed harder work to use and, as ever, “Time’s wingèd chariot” was the enemy.
The source for this post can be downloaded from github.
AgeDependent Populations
McKendrick / von Foerster
McKendrick and von Foerster independently derived a model of agedependent population growth.
Let be the density of females of age at time . The number of females between ages and are thus . Assuming individuals are born at age , we have
where is the death rate density and denotes the rate of entry to the cohort of age . Dividing by we obtain
which in the limit becomes
We can further assume that the rate of entry to a cohort is proportional to the density of individuals times a velocity of aging .
Occasionally there is some reason to assume that aging one year is different to experiencing one year but we further assume .
We thus obtain
Gurtin / MacCamy
To solve any PDE we need boundary and initial conditions. The number of births at time is
where is the natality aka birthmodulus and
and we further assume that the initial condition
for some given .
Gurtin and MacCamy (1974) focus on the situation where
and we can also assume that the birth rate of Tribbles decreases exponentially with age and further that Tribbles can live forever. Gurtin and MacCamy (1974) then transform the PDE to obtain a pair of linked ODEs which can then be solved numerically.
Of course, we know what happens in the Enterprise and rather than continue with this example, let us turn our attention to the more serious subject of Malaria.
Malaria
I realise now that I went a bit overboard with references. Hopefully they don’t interrupt the flow too much.
The World Health Organisation (WHO) estimated that in 2015 there were 214 million new cases of malaria resulting in 438,000 deaths (source: Wikipedia).
The lifecycle of the plasmodium parasite that causes malaria is extremely ingenious. J. J. Thibodeaux and Schlittenhardt (2011) model the human segment of the plasmodium lifecycle and further propose a way of determing an optimal treatment for an infected individual. Hall et al. (2013) also model the effect of an antimalarial. Let us content ourselves with reproducing part of the paper by J. J. Thibodeaux and Schlittenhardt (2011).
At one part of its sojourn in humans, plasmodium infects erythrocytes aka red blood cells. These latter contain haemoglobin (aka hemoglobin). The process by which red blood cells are produced, Erythropoiesis, is modulated in a feedback loop by erythropoietin. The plasmodium parasite severely disrupts this process. Presumably the resulting loss of haemoglobin is one reason that an infected individual feels ill.
As can be seen in the overview by Torbett and Friedman (2009), the full feedback loop is complex. So as not to lose ourselves in the details and following J. J. Thibodeaux and Schlittenhardt (2011) and Belair, Mackey, and Mahaffy (1995), we consider a model with two compartments.

Precursors: prototype erythrocytes developing in the bone marrow with being the density of such cells of age at time .

Erythrocytes: mature red blood cells circulating in the blood with being the density of such cells of age at time .
where is the birth rate of precursors and is the death rate of erythrocytes, is the maturation rate of precursors and where
As boundary conditions, we have that the number of precursors maturing must equal the production of number of erythrocytes
and the production of the of the number of precursors depends on the level of erythropoietin
where is some proportionality function.
As initial conditions, we have
We can further model the erythropoietin dynamics as
where is the feedback function from the kidneys and the decay rate, depends on the total precursor population (Sawyer, Krantz, and Goldwasser (1987)) although this often is taken to be a constant and I would feel more comfortable with a more recent citation and where
As initial condition we have
A Finite Difference Attempt
Let us try solving the above model using a finite difference scheme observing that we currently have no basis for whether it has a solution and whether the finite difference scheme approximates such a solution! We follow J. J. Thibodeaux and Schlittenhardt (2011) who give a proof of convergence presumably with some conditions; any failure of the scheme is entirely mine.
Divide up the age and time ranges, , and into equal subintervals, , and where
where , and .
Denoting and similarly we obtain
and
Rearranging we get
Writing
We can express the above in matrix form
Finally we can write
A Haskell Implementation
> {# OPTIONS_GHC Wall #}
> {# LANGUAGE TypeFamilies #}
> {# LANGUAGE NoImplicitPrelude #}
> {# LANGUAGE FlexibleContexts #}
> {# LANGUAGE DataKinds #}
> {# LANGUAGE TypeOperators #}
> module Tribbles where
> import qualified Prelude as P
> import Numeric.Units.Dimensional.Prelude hiding (Unit)
> import Numeric.Units.Dimensional
> import Numeric.LinearAlgebra
> import Numeric.Integration.TanhSinh
> import Control.Monad.Writer
> import Control.Monad.Loops
Substances like erythropoietin (EPO) are measured in International Units and these cannot be converted to Moles (see Jelkmann (2009) for much more detail) so we have to pretend it really is measured in Moles as there seems to be no easy way to define what the dimensional package calls a base dimension. A typical amount for a person is 15 milliIU / milllitre but can reach much higher levels after loss of blood.
> muPerMl :: (Fractional a, Num a) => Unit 'NonMetric DConcentration a
> muPerMl = (milli mole) / (milli litre)
> bigE'0 :: Concentration Double
> bigE'0 = 15.0 *~ muPerMl
Let’s set up our grid. We take these from Ackleh et al. (2006) but note that J. J. Thibodeaux and Schlittenhardt (2011) seem to have .
> deltaT, deltaMu, deltaNu :: Time Double
> deltaT = 0.05 *~ day
> deltaMu = 0.01 *~ day
> deltaNu = 0.05 *~ day
> bigT :: Time Double
> bigT = 100.0 *~ day
> muF, nuF :: Time Double
> muF = 5.9 *~ day
> nuF = 120.0 *~ day
> bigK :: Int
> bigK = floor (bigT / deltaT /~ one)
> n1 :: Int
> n1 = floor (muF / deltaMu /~ one)
> n2 :: Int
> n2 = floor (nuF / deltaNu /~ one)
> ts :: [Time Double]
> ts = take bigK $ 0.0 *~ day : (map (+ deltaT) ts)
The birth rate for precursors
> betaThibodeaux :: Time Double >
> Frequency Double
> betaThibodeaux mu
>  mu < (0 *~ day) = error "betaThibodeaux: negative age"
>  mu < (3 *~ day) = (2.773 *~ (one / day))
>  otherwise = (0.0 *~ (one /day))
> alphaThibodeaux :: Concentration Double >
> Frequency Double
> alphaThibodeaux e = (0.5 *~ (muPerMl / day)) / ((1 *~ muPerMl) + e)
> sigmaThibodeaux :: Time Double >
> Time Double >
> Concentration Double >
> Frequency Double
> sigmaThibodeaux mu _t e = gThibodeaux e * (betaThibodeaux mu  alphaThibodeaux e)
and an alternative birth rate
> betaAckleh :: Time Double > Frequency Double
> betaAckleh mu
>  mu < (0 *~ day) = error "betaAckleh: negative age"
>  mu < (3 *~ day) = 2.773 *~ (one / day)
>  otherwise = 0.000 *~ (one / day)
> sigmaAckleh :: Time Double >
> Time Double >
> Concentration Double >
> Frequency Double
> sigmaAckleh mu _t e = betaAckleh mu * gAckleh e
J. J. Thibodeaux and Schlittenhardt (2011) give the maturation rate of precursors as
> gThibodeaux :: Concentration Double > Dimensionless Double
> gThibodeaux e = d / n
> where
> n = ((3.02 *~ one) * e + (0.31 *~ muPerMl))
> d = (30.61 *~ muPerMl) + e
and Ackleh et al. (2006) give this as
> gAckleh :: Concentration Double > Dimensionless Double
> gAckleh _e = 1.0 *~ one
As in J. J. Thibodeaux and Schlittenhardt (2011) we give quantities in terms of cells per kilogram of body weight. Note that these really are moles on this occasion.
> type CellDensity = Quantity (DAmountOfSubstance / DTime / DMass)
Let’s set the initial conditions.
> p'0 :: Time Double > CellDensity Double
> p'0 mu' = (1e11 *~ one) * pAux mu'
> where
> pAux mu
>  mu < (0 *~ day) = error "p'0: negative age"
>  mu < (3 *~ day) = 8.55e6 *~ (mole / day / kilo gram) *
> exp ((2.7519 *~ (one / day)) * mu)
>  otherwise = 8.55e6 *~ (mole / day / kilo gram) *
> exp (8.319 *~ one  (0.0211 *~ (one / day)) * mu)
> m_0 :: Time Double > CellDensity Double
> m_0 nu' = (1e11 *~ one) * mAux nu'
> where
> mAux nu
>  nu < (0 *~ day) = error "m_0: age less than zero"
>  otherwise = 0.039827 *~ (mole / day / kilo gram) *
> exp (((0.0083) *~ (one / day)) * nu)
And check that these give plausible results.
> m_0Untyped :: Double > Double
> m_0Untyped nu = m_0 (nu *~ day) /~ (mole / day / kilo gram)
> p'0Untyped :: Double > Double
> p'0Untyped mu = p'0 (mu *~ day) /~ (mole / day / kilo gram)
ghci> import Numeric.Integration.TanhSinh
ghci> result $ relative 1e6 $ parTrap m_0Untyped 0.001 (nuF /~ day)
3.0260736659043414e11
ghci> result $ relative 1e6 $ parTrap p'0Untyped 0.001 (muF /~ day)
1.0453999900927126e10
We can now create the components for the first matrix equation.
> g'0 :: Dimensionless Double
> g'0 = gThibodeaux bigE'0
> d_1'0 :: Int > Dimensionless Double
> d_1'0 i = (1 *~ one) + (g'0 * deltaT / deltaMu)
>  deltaT * sigmaThibodeaux ((fromIntegral i *~ one) * deltaMu) undefined bigE'0
> lowers :: [Dimensionless Double]
> lowers = replicate n1 (negate $ g'0 * deltaT / deltaMu)
> diags :: [Dimensionless Double]
> diags = g'0 : map d_1'0 [1..n1]
> uppers :: [Dimensionless Double]
> uppers = replicate n1 (0.0 *~ one)
J. J. Thibodeaux and Schlittenhardt (2011) does not give a definition for so we use the equivalent from Ackleh et al. (2006) which references Banks et al. (2003): “ erythrocytes/kg body weight mL plasma/mU Epo/day”
> s_0 :: Time Double >
> Quantity (DAmountOfSubstance / DTime / DMass / DConcentration) Double
> s_0 = const ((1e11 *~ one) * (4.45e7 *~ (mole / day / kilo gram / muPerMl)))
> b'0 :: [CellDensity Double]
> b'0 = (s_0 (0.0 *~ day) * bigE'0) : (take n1 $ map p'0 (iterate (+ deltaMu) deltaMu))
With these components in place we can now solve the implicit scheme and get the age distribution of precursors after one time step.
> p'1 :: Matrix Double
> p'1 = triDiagSolve (fromList (map (/~ one) lowers))
> (fromList (map (/~ one) diags))
> (fromList (map (/~ one) uppers))
> (((n1 P.+1 )><1) (map (/~ (mole / second / kilo gram)) b'0))
In order to create the components for the second matrix equation, we need the death rates of mature erythrocytes
> gammaThibodeaux :: Time Double >
> Time Double >
> Quantity (DAmountOfSubstance / DMass) Double >
> Frequency Double
> gammaThibodeaux _nu _t _bigM = 0.0083 *~ (one / day)
We note an alternative for the death rate
> gammaAckleh :: Time Double >
> Time Double >
> Quantity (DAmountOfSubstance / DMass) Double >
> Frequency Double
> gammaAckleh _nu _t bigM = (0.01 *~ (kilo gram / mole / day)) * bigM + 0.0001 *~ (one / day)
For the intial mature erythrocyte population we can either use the integral of the initial distribution
> bigM'0 :: Quantity (DAmountOfSubstance / DMass) Double
> bigM'0 = r *~ (mole / kilo gram)
> where
> r = result $ relative 1e6 $ parTrap m_0Untyped 0.001 (nuF /~ day)
ghci> bigM'0
3.0260736659043414e11 kg^1 mol
or we can use the sum of the values used in the finite difference approximation
> bigM'0' :: Quantity (DAmountOfSubstance / DMass) Double
> bigM'0' = (* deltaNu) $ sum $ map m_0 $ take n2 $ iterate (+ deltaNu) (0.0 *~ day)
ghci> bigM'0'
3.026741454719976e11 kg^1 mol
Finally we can create the components
> d_2'0 :: Int > Dimensionless Double
> d_2'0 i = (1 *~ one) + (g'0 * deltaT / deltaNu)
> + deltaT * gammaThibodeaux ((fromIntegral i *~ one) * deltaNu) undefined bigM'0
> lowers2 :: [Dimensionless Double]
> lowers2 = replicate n2 (negate $ deltaT / deltaNu)
> diags2 :: [Dimensionless Double]
> diags2 = (1.0 *~ one) : map d_2'0 [1..n2]
> uppers2 :: [Dimensionless Double]
> uppers2 = replicate n2 (0.0 *~ one)
> b_2'0 :: [CellDensity Double]
> b_2'0 = (g'0 * ((p'1 `atIndex` (n1,0)) *~ (mole / second / kilo gram))) :
> (take n2 $ map m_0 (iterate (+ deltaNu) deltaNu))
and then solve the implicit scheme to get the distribution of mature erythrocytes one time step ahead
> m'1 :: Matrix Double
> m'1 = triDiagSolve (fromList (map (/~ one) lowers2))
> (fromList (map (/~ one) diags2))
> (fromList (map (/~ one) uppers2))
> (((n2 P.+ 1)><1) (map (/~ (mole / second / kilo gram)) b_2'0))
We need to complete the homeostatic loop by implmenting the feedback from the kidneys to the bone marrow. Ackleh and Thibodeaux (2013) and Ackleh et al. (2006) give as
> fAckleh :: Time Double >
> Quantity (DAmountOfSubstance / DMass) Double >
> Quantity (DConcentration / DTime) Double
> fAckleh _t bigM = a / ((1.0 *~ one) + k * (bigM' ** r))
> where
> a = 15600 *~ (muPerMl / day)
> k = 0.0382 *~ one
> r = 6.96 *~ one
> bigM' = ((bigM /~ (mole / kilo gram)) *~ one) * (1e11 *~ one)
The much older Belair, Mackey, and Mahaffy (1995) gives as
> fBelair :: Time Double >
> Quantity (DAmountOfSubstance / DMass) Double >
> Quantity (DConcentration / DTime) Double
> fBelair _t bigM = a / ((1.0 *~ one) + k * (bigM' ** r))
> where
> a = 6570 *~ (muPerMl / day)
> k = 0.0382 *~ one
> r = 6.96 *~ one
> bigM' = ((bigM /~ (mole / kilo gram)) *~ one) * (1e11 *~ one)
For the intial precursor population we can either use the integral of the initial distribution
result $ relative 1e6 $ parTrap p'0Untyped 0.001 (muF /~ day)
> bigP'0 :: Quantity (DAmountOfSubstance / DMass) Double
> bigP'0 = r *~ (mole / kilo gram)
> where
> r = result $ relative 1e6 $ parTrap p'0Untyped 0.001 (muF /~ day)
ghci> bigP'0
1.0453999900927126e10 kg^1 mol
or we can use the sum of the values used in the finite difference approximation
> bigP'0' :: Quantity (DAmountOfSubstance / DMass) Double
> bigP'0' = (* deltaMu) $ sum $ map p'0 $ take n1 $ iterate (+ deltaMu) (0.0 *~ day)
ghci> bigP'0'
1.0438999930030743e10 kg^1 mol
J. J. Thibodeaux and Schlittenhardt (2011) give the following for
> a_E :: Quantity (DAmountOfSubstance / DMass) Double > Frequency Double
> a_E bigP = ((n / d) /~ one) *~ (one / day)
> where
> n :: Dimensionless Double
> n = bigP * (13.8 *~ (kilo gram / mole)) + 0.04 *~ one
> d :: Dimensionless Double
> d = (bigP /~ (mole / kilo gram)) *~ one + 0.08 *~ one
but from Ackleh et al. (2006)
The only biological basis for the latter is that the decay rate of erythropoietin should be an increasing function of the precursor population and this function remains in the range 0.50–6.65
and, given this is at variance with their given function, it may be safer to use their alternative of
> a_E' :: Quantity (DAmountOfSubstance / DMass) Double > Frequency Double
> a_E' _bigP = 6.65 *~ (one / day)
We now further calculate the concentration of EPO one time step ahead.
> f'0 :: Quantity (DConcentration / DTime) Double
> f'0 = fAckleh undefined bigM'0
> bigE'1 :: Concentration Double
> bigE'1 = (bigE'0 + deltaT * f'0) / (1.0 *~ one + deltaT * a_E' bigP'0)
Having done this for one time step starting at , it’s easy to generalize this to an arbitrary time step.
> d_1 :: Dimensionless Double >
> Concentration Double >
> Int >
> Dimensionless Double
> d_1 g e i = (1 *~ one) + (g * deltaT / deltaMu)
>  deltaT * sigmaThibodeaux ((fromIntegral i *~ one) * deltaMu) undefined e
> d_2 :: Quantity (DAmountOfSubstance / DMass) Double >
> Int >
> Dimensionless Double
> d_2 bigM i = (1 *~ one) + deltaT / deltaNu
> + deltaT * gammaThibodeaux ((fromIntegral i *~ one) * deltaNu) undefined bigM
> oneStepM :: (Matrix Double, Matrix Double, Concentration Double, Time Double) >
> Writer [(Quantity (DAmountOfSubstance / DMass) Double,
> Quantity (DAmountOfSubstance / DMass) Double,
> Concentration Double)]
> (Matrix Double, Matrix Double, Concentration Double, Time Double)
> oneStepM (psPrev, msPrev, ePrev, tPrev) = do
> let
> g = gThibodeaux ePrev
> ls = replicate n1 (negate $ g * deltaT / deltaMu)
> ds = g : map (d_1 g ePrev) [1..n1]
> us = replicate n1 (0.0 *~ one)
> b1'0 = (s_0 tPrev * ePrev) /~ (mole / second / kilo gram)
> b1 = asColumn $ vjoin [scalar b1'0, subVector 1 n1 $ flatten psPrev]
> psNew :: Matrix Double
> psNew = triDiagSolve (fromList (map (/~ one) ls))
> (fromList (map (/~ one) ds))
> (fromList (map (/~ one) us))
> b1
> ls2 = replicate n2 (negate $ deltaT / deltaNu)
> bigM :: Quantity (DAmountOfSubstance / DMass) Double
> bigM = (* deltaNu) $ ((sumElements msPrev) *~ (mole / kilo gram / second))
> ds2 = (1.0 *~ one) : map (d_2 bigM) [1..n2]
> us2 = replicate n2 (0.0 *~ one)
> b2'0 = (g * (psNew `atIndex` (n1, 0) *~ (mole / second / kilo gram))) /~
> (mole / second / kilo gram)
> b2 = asColumn $ vjoin [scalar b2'0, subVector 1 n2 $ flatten msPrev]
> msNew :: Matrix Double
> msNew = triDiagSolve (fromList (map (/~ one) ls2))
> (fromList (map (/~ one) ds2))
> (fromList (map (/~ one) us2))
> b2
> bigP :: Quantity (DAmountOfSubstance / DMass) Double
> bigP = (* deltaMu) $ sumElements psPrev *~ (mole / kilo gram / second)
> f :: Quantity (DConcentration / DTime) Double
> f = fAckleh undefined bigM
> eNew :: Concentration Double
> eNew = (ePrev + deltaT * f) / (1.0 *~ one + deltaT * a_E' bigP)
> tNew = tPrev + deltaT
> tell [(bigP, bigM, ePrev)]
> return (psNew, msNew, eNew, tNew)
We can now run the model for 100 days.
> ys :: [(Quantity (DAmountOfSubstance / DMass) Double,
> Quantity (DAmountOfSubstance / DMass) Double,
> Concentration Double)]
> ys = take 2000 $
> snd $
> runWriter $
> iterateM_ oneStepM ((((n1 P.+1 )><1) (map (/~ (mole / second / kilo gram)) b'0)),
> (((n2 P.+ 1)><1) $ (map (/~ (mole / second / kilo gram)) b_2'0)),
> bigE'0,
> (0.0 *~ day))
And now we can plot what happens for a period of 100 days.
References
Ackleh, Azmy S., and Jeremy J. Thibodeaux. 2013. “A secondorder finite difference approximation for a mathematical model of erythropoiesis.” Numerical Methods for Partial Differential Equations, no. November: n/a–n/a. doi:10.1002/num.21778.
Ackleh, Azmy S., Keng Deng, Kazufumi Ito, and Jeremy Thibodeaux. 2006. “A Structured Erythropoiesis Model with Nonlinear Cell Maturation Velocity and Hormone Decay Rate.” Mathematical Biosciences 204 (1): 21–48. doi:http://dx.doi.org/10.1016/j.mbs.2006.08.004.
Banks, H T, Cammey E Cole, Paul M Schlosser, and Hien T Tran. 2003. “Modeling and Optimal Regulation of Erythropoiesis Subject to Benzene Intoxication.” https://www.ncsu.edu/crsc/reports/ftp/pdf/crsctr0349.pdf.
Belair, Jacques, Michael C. Mackey, and Joseph M. Mahaffy. 1995. “AgeStructured and TwoDelay Models for Erythropoiesis.” Mathematical Biosciences 128 (1): 317–46. doi:http://dx.doi.org/10.1016/00255564(94)00078E.
Gurtin, Morton E, and Richard C MacCamy. 1974. “NonLinear AgeDependent Population Dynamics.” Archive for Rational Mechanics and Analysis 54 (3). Springer: 281–300.
Hall, Adam J, Michael J Chappell, John AD Aston, and Stephen A Ward. 2013. “Pharmacokinetic Modelling of the AntiMalarial Drug Artesunate and Its Active Metabolite Dihydroartemisinin.” Computer Methods and Programs in Biomedicine 112 (1). Elsevier: 1–15.
Jelkmann, Wolfgang. 2009. “Efficacy of Recombinant Erythropoietins: Is There Unity of International Units?” Nephrology Dialysis Transplantation 24 (5): 1366. doi:10.1093/ndt/gfp058.
Sawyer, Stephen T, SB Krantz, and E Goldwasser. 1987. “Binding and ReceptorMediated Endocytosis of Erythropoietin in Friend VirusInfected Erythroid Cells.” Journal of Biological Chemistry 262 (12). ASBMB: 5554–62.
Thibodeaux, Jeremy J., and Timothy P. Schlittenhardt. 2011. “Optimal Treatment Strategies for Malaria Infection.” Bulletin of Mathematical Biology 73 (11): 2791–2808. doi:10.1007/s1153801196508.
Torbett, Bruce E., and Jeffrey S. Friedman. 2009. “Erythropoiesis: An Overview.” In Erythropoietins, Erythropoietic Factors, and Erythropoiesis: Molecular, Cellular, Preclinical, and Clinical Biology, edited by Steven G. Elliott, Mary Ann Foote, and Graham Molineux, 3–18. Basel: Birkhäuser Basel. doi:10.1007/9783764386986_1.
Warming up for NUTS (No UTurn)
I have been thinking about writing a blog on why the no uturn sampler (NUTS) works rather than describing the actual algorithm. This led me to look at Jared Tobin’s Haskell implementation. His example tries to explore the Himmelblau function but only finds one local minima. This is not unexpected; as the excellent Stan manual notes
Being able to carry out such invariant inferences in practice is an altogether different matter. It is almost always intractable to find even a single posterior mode, much less balance the exploration of the neighborhoods of multiple local maxima according to the probability masses.
and
For HMC and NUTS, the problem is that the sampler gets stuck in one of the two "bowls" around the modes and cannot gather enough energy from random momentum assignment to move from one mode to another.
rm(list = ls(all.names=TRUE))
unlink(".RData")
rstan::stan_version()
## [1] "2.12.0"
rstan_options(auto_write = TRUE)
On the Rosenbrock function it fares much better.
knitr::include_graphics("RosenbrockA.png")
We can’t (at least I don’t know how to) try Stan out on Rosenbrock as its not a distribution but we can try it out on another nasty problem: the funnel. Some of this is taken directly from the Stan manual.
Here’s the Stan:
parameters {
real y;
vector[9] x;
}
model {
y ~ normal(0,3);
x ~ normal(0,exp(y/2));
}
which we can run with the following R:
funnel_fit < stan(file='funnel.stan', cores=4, iter=10000)
## Warning: There were 92 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mcstan.org/misc/warnings.html#divergenttransitionsafterwarmup
## Warning: Examine the pairs() plot to diagnose sampling problems
funnel_samples < extract(funnel_fit,permuted=TRUE,inc_warmup=FALSE);
funnel_df < data.frame(x1=funnel_samples$x[,1],y=funnel_samples$y[])
Plotting the data requires some unpleasantness but shows the neck of the funnel does not get explored. So even HMC and NUTS do not perform well.
midpoints < function(x, dp=2){
lower < as.numeric(gsub(",.*","",gsub("\\(\\[\\)\\]","", x)))
upper < as.numeric(gsub(".*,","",gsub("\\(\\[\\)\\]","", x)))
return(round(lower+(upperlower)/2, dp))
}
df < funnel_df[funnel_df$x1 < 20 & funnel_df$x1 > 20 & funnel_df$y < 9 & funnel_df$y > 9,]
x_c < cut(df$x1, 20)
y_c < cut(df$y, 20)
z < table(x_c, y_c)
z_df < as.data.frame(z)
a_df < data.frame(x=midpoints(z_df$x_c),y=midpoints(z_df$y_c),f=z_df$Freq)
m = as.matrix(dcast(a_df,x ~ y))
## Using f as value column: use value.var to override.
hist3D(x=m[,"x"],y=as.double(colnames(m)[2:21]),z=(as.matrix(dcast(a_df,x ~ y)))[,2:21], border="black",ticktype = "detailed",theta=5,phi=20)
## Using f as value column: use value.var to override.
Since the analytic form of the distribution is known, one can apply a trick to correct this problem and then one is sampling from unit normals!
parameters {
real y_raw;
vector[9] x_raw;
}
transformed parameters {
real y;
vector[9] x;
y < 3.0 * y_raw;
x < exp(y/2) * x_raw;
}
model {
y_raw ~ normal(0,1);
x_raw ~ normal(0,1);
}
And now the neck of the funnel is explored.
funnel_fit < stan(file='funnel_reparam.stan', cores=4, iter=10000)
funnel_samples < extract(funnel_fit,permuted=TRUE,inc_warmup=FALSE);
funnel_df < data.frame(x1=funnel_samples$x[,1],y=funnel_samples$y[])
df < funnel_df[funnel_df$x1 < 20 & funnel_df$x1 > 20 & funnel_df$y < 9 & funnel_df$y > 9,]
x_c < cut(df$x1, 20)
y_c < cut(df$y, 20)
z < table(x_c, y_c)
z_df < as.data.frame(z)
a_df < data.frame(x=midpoints(z_df$x_c),y=midpoints(z_df$y_c),f=z_df$Freq)
m = as.matrix(dcast(a_df,x ~ y))
## Using f as value column: use value.var to override.
hist3D(x=m[,"x"],y=as.double(colnames(m)[2:21]),z=(as.matrix(dcast(a_df,x ~ y)))[,2:21], border="black",ticktype = "detailed",theta=5,phi=20)
## Using f as value column: use value.var to override.
We’d expect the Haskell implementation to also fail to explore the neck. Maybe I will return to this after the article on why NUTS works.
Calling Haskell from C
As part of improving the random number generation story for Haskell, I want to be able to use the testu01 library with the minimal amount of Haskell wrapping. testu01 assumes that there is a C function which returns the random number. The ghc manual gives an example but does not give all the specifics. These are my notes on how to get the example working under OS X (El Capitain 10.11.5 to be precise).
The Haskell:
{# OPTIONS_GHC Wall #}
{# LANGUAGE ForeignFunctionInterface #}
module Foo where
foreign export ccall foo :: Int > IO Int
foo :: Int > IO Int
foo n = return (length (f n))
f :: Int > [Int]
f 0 = []
f n = n:(f (n1))
The .cabal:
name: testviac
version: 0.1.0.0
homepage: TBD
license: MIT
author: Dominic Steinitz
maintainer: idontgetoutmuch@gmail.com
category: System
buildtype: Simple
cabalversion: >=1.10
executable Foo.dylib
mainis: Foo.hs
otherextensions: ForeignFunctionInterface
builddepends: base >=4.7 && =0.6 && <0.7
hssourcedirs: src
defaultlanguage: Haskell2010
includedirs: src
ghcoptions: O2 shared fPIC dynamic
extralibraries: HSrtsghc8.0.1
On my computer running
cabal install
places the library in
~/Library/Haskell/ghc8.0.1/lib/testviac0.1.0.0/bin
The C:
#include
#include "HsFFI.h"
#include "../dist/build/Foo.dylib/Foo.dylibtmp/Foo_stub.h"
int main(int argc, char *argv[])
{
int i;
hs_init(&argc, &argv);
for (i = 0; i < 5; i++) {
printf("%d\n", foo(2500));
}
hs_exit();
return 0;
}
On my computer this can be compiled with
gcc6 Bar.c
~/Library/Haskell/ghc8.0.1/lib/testviac0.1.0.0/bin/Foo.dylib
I/Library/Frameworks/GHC.framework/Versions/8.0.1x86_64/usr/lib/ghc8.0.1/include
L/Library/Frameworks/GHC.framework/Versions/8.0.1x86_64/usr/lib/ghc8.0.1/rts
lHSrtsghc8.0.1
and can be run with
DYLD_LIBRARY_PATH=
~/Library/Haskell/ghc8.0.1/lib/testviac0.1.0.0/bin:
/Library/Frameworks/GHC.framework/Versions/8.0.1x86_64/usr/lib/ghc8.0.1/rts
N.B. setting DYLD_LIBRARY_PATH
like this is not recommended as it is a good way of breaking things. I have tried setting DYLD_FALLBACK_LIBRARY_PATH
but only to get an error message. Hopefully, at some point I will be able to post a robust way of getting the executable to pick up the required dynamic libraries.
UK / South Korea Trade: A Bayesian Analysis
Introduction
I was intrigued by a tweet by the UK Chancellor of the Exchequer stating "exports [to South Korea] have doubled over the last year. Now worth nearly £11bn” and a tweet by a Member of the UK Parliament stating South Korea "our second fastest growing trading partner". Although I have never paid much attention to trade statistics, both these statements seemed surprising. But these days it’s easy enough to verify such statements. It’s also an opportunity to use the techniques I believe data scientists in (computer) game companies use to determine how much impact a new feature has on the game’s consumers.
One has to be slightly careful with trade statistics as they come in many different forms, e.g., just goods or goods and services etc. When I provide software and analyses to US organisations, I am included in the services exports from the UK to the US.
Let’s analyse goods first before moving on to goods and services.
Getting the Data
First let’s get hold of the quarterly data from the UK Office of National Statistics.
ukstats < "https://www.ons.gov.uk"
bop < "economy/nationalaccounts/balanceofpayments"
ds < "datasets/tradeingoodsmretsallbopeu2013timeseriesspreadsheet/current/mret.csv"
mycsv < read.csv(paste(ukstats,"file?uri=",bop,ds,sep="/"),stringsAsFactors=FALSE)
Now we can find the columns that refer to Korea.
ns < which(grepl("Korea", names(mycsv)))
length(ns)
## [1] 3
names(mycsv[ns[1]])
## [1] "BoP.consistent..South.Korea..Exports..SA................................"
names(mycsv[ns[2]])
## [1] "BoP.consistent..South.Korea..Imports..SA................................"
names(mycsv[ns[3]])
## [1] "BoP.consistent..South.Korea..Balance..SA................................"
Now we can pull out the relevant information and create a data frame of it.
korean < mycsv[grepl("Korea", names(mycsv))]
imports < korean[grepl("Imports", names(korean))]
exports < korean[grepl("Exports", names(korean))]
balance < korean[grepl("Balance", names(korean))]
df < data.frame(mycsv[grepl("Title", names(mycsv))],
imports,
exports,
balance)
colnames(df) < c("Title", "Imports", "Exports", "Balance")
startQ < which(grepl("1998 Q1",df$Title))
endQ < which(grepl("2016 Q3",df$Title))
dfQ < df[startQ:endQ,]
We can now plot the data.
tab < data.frame(kr=as.numeric(dfQ$Exports),
krLabs=as.numeric(as.Date(as.yearqtr(dfQ$Title,format='%Y Q%q'))))
ggplot(tab, aes(x=as.Date(tab$krLabs), y=tab$kr)) + geom_line() +
theme(legend.position="bottom") +
ggtitle("Goods Exports UK / South Korea (Quarterly)") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Date") +
ylab("Value (£m)")
For good measure let’s plot the annual data.
startY < grep("^1998$",df$Title)
endY < grep("^2015$",df$Title)
dfYear < df[startY:endY,]
tabY < data.frame(kr=as.numeric(dfYear$Exports),
krLabs=as.numeric(dfYear$Title))
ggplot(tabY, aes(x=tabY$krLabs, y=tabY$kr)) + geom_line() +
theme(legend.position="bottom") +
ggtitle("Goods Exports UK / South Korea (Annual)") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Date") +
ylab("Value (£m)")
And the monthly data.
startM < grep("1998 JAN",df$Title)
endM < grep("2016 OCT",df$Title)
dfMonth < df[startM:endM,]
tabM < data.frame(kr=as.numeric(dfMonth$Exports),
krLabs=as.numeric(as.Date(as.yearmon(dfMonth$Title,format='%Y %B'))))
ggplot(tabM, aes(x=as.Date(tabM$krLabs), y=tabM$kr)) + geom_line() +
theme(legend.position="bottom") +
ggtitle("Goods Exports UK / South Korea (Monthly)") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Date") +
ylab("Value (£m)")
It looks like some change took place in 2011 but nothing to suggest either that "export have doubled over the last year" or that South Korea is "our second fastest growing partner". That some sort of change did happen is further supported by the fact a Free Trade Agreement between the EU and Korea was put in place in 2011.
But was there really a change? And what sort of change was it? Sometimes it’s easy to imagine patterns where there are none.
With this warning in mind let us see if we can get a better feel from the numbers as to what happened.
The Model
Let us assume that the data for exports are approximated by a linear function of time but that there is a change in the slope and the offset at some point during observation.
Since we are going to use stan to infer the parameters for this model and stan cannot handle discrete parameters, we need to marginalize out this (discrete) parameter. I hope to do the same analysis with LibBi which seems more suited to time series analysis and which I believe will not require such a step.
Setting D = {y_{i}}_{i = 1}^{N} we can calculate the likelihood
stan operates on the log scale and thus requires the log likelihood
where
and where the log sum of exponents function is defined by
The log sum of exponents function allows the model to be coded directly in Stan using the builtin function , which provides both arithmetic stability and efficiency for mixture model calculations.
Stan
Here’s the model in stan. Sadly I haven’t found a good way of divvying up .stan
files in a .Rmd
file so that it still compiles.
data {
int<lower=1> N;
real x[N];
real y[N];
}
parameters {
real mu1;
real mu2;
real gamma1;
real gamma2;
real<lower=0> sigma1;
real<lower=0> sigma2;
}
transformed parameters {
vector[N] log_p;
real mu;
real sigma;
log_p = rep_vector(log(N), N);
for (tau in 1:N)
for (i in 1:N) {
mu = i < tau ? (mu1 * x[i] + gamma1) : (mu2 * x[i] + gamma2);
sigma = i < tau ? sigma1 : sigma2;
log_p[tau] = log_p[tau] + normal_lpdf(y[i]  mu, sigma);
}
}
model {
mu1 ~ normal(0, 10);
mu2 ~ normal(0, 10);
gamma1 ~ normal(0, 10);
gamma2 ~ normal(0, 10);
sigma1 ~ normal(0, 10);
sigma2 ~ normal(0, 10);
target += log_sum_exp(log_p);
}
generated quantities {
int<lower=1,upper=N> tau;
tau = categorical_rng(softmax(log_p));
}
The above, although mimicking our mathematical model, has quadratic complexity and we can use the trick in the stan manual to make it linear albeit with less clarity.
data {
int<lower=1> N;
real x[N];
real y[N];
}
parameters {
real mu1;
real mu2;
real gamma1;
real gamma2;
real<lower=0> sigma1;
real<lower=0> sigma2;
}
transformed parameters {
vector[N] log_p;
{
vector[N+1] log_p_e;
vector[N+1] log_p_l;
log_p_e[1] = 0;
log_p_l[1] = 0;
for (i in 1:N) {
log_p_e[i + 1] = log_p_e[i] + normal_lpdf(y[i]  mu1 * x[i] + gamma1, sigma1);
log_p_l[i + 1] = log_p_l[i] + normal_lpdf(y[i]  mu2 * x[i] + gamma2, sigma2);
}
log_p = rep_vector(log(N) + log_p_l[N + 1], N) + head(log_p_e, N)  head(log_p_l, N);
}
}
model {
mu1 ~ normal(0, 10);
mu2 ~ normal(0, 10);
gamma1 ~ normal(0, 10);
gamma2 ~ normal(0, 10);
sigma1 ~ normal(0, 10);
sigma2 ~ normal(0, 10);
target += log_sum_exp(log_p);
}
generated quantities {
int<lower=1,upper=N> tau;
tau = categorical_rng(softmax(log_p));
}
Let’s run this model with the monthly data.
NM < nrow(tabM)
KM < ncol(tabM)
yM < tabM$kr
XM < data.frame(tabM,rep(1,NM))[,2:3]
fitM < stan(
file = "lrchangepointng.stan",
data = list(x = XM$krLabs, y = yM, N = length(yM)),
chains = 4,
warmup = 1000,
iter = 10000,
cores = 4,
refresh = 500,
seed=42
)
## Warning: There were 662 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mcstan.org/misc/warnings.html#divergenttransitionsafterwarmup
## Warning: Examine the pairs() plot to diagnose sampling problems
Looking at the results below we see a multimodal distribution so a mean is not of much use.
histData < hist(extract(fitM)$tau,plot=FALSE,breaks=c(seq(1,length(yM),1)))
histData$counts
## [1] 18000 0 0 0 0 0 0 0 0 0 0
## [12] 0 0 0 0 0 0 0 0 0 0 0
## [23] 0 0 0 0 0 0 0 0 0 0 0
## [34] 0 0 0 0 0 0 0 0 0 0 0
## [45] 0 0 0 0 0 0 0 0 0 0 0
## [56] 0 0 0 0 0 0 0 0 0 0 0
## [67] 0 0 0 0 0 0 0 0 0 0 0
## [78] 0 0 0 0 0 0 0 0 0 0 0
## [89] 0 0 0 0 0 0 0 0 0 0 0
## [100] 0 0 0 0 0 0 0 0 0 0 0
## [111] 0 0 0 0 0 0 0 1 4 12 16
## [122] 16 107 712 8132 0 0 0 0 0 0 0
## [133] 0 0 0 0 0 0 0 0 0 0 0
## [144] 0 0 0 0 0 0 0 0 0 0 0
## [155] 0 0 0 0 0 0 0 0 0 0 25
## [166] 171 2812 0 0 0 0 0 0 0 0 0
## [177] 0 0 0 0 0 0 0 0 0 0 0
## [188] 0 0 0 0 0 0 0 0 0 0 0
## [199] 0 0 0 0 0 0 0 0 0 0 0
## [210] 0 0 0 0 0 0 0 0 0 0 0
## [221] 0 0 0 0 5992
We can get a pictorial representation of the maxima so that the multimodality is even clearer.
min_indexes = which(diff( sign(diff( c(0,histData$counts,0)))) == 2)
max_indexes = which(diff( sign(diff( c(0,histData$counts,0)))) == 2)
modeData = data.frame(x=1:length(histData$counts),y=histData$counts)
min_locs = modeData[min_indexes,]
max_locs = modeData[max_indexes,]
plot(modeData$y, type="l")
points( min_locs, col="red", pch=19, cex=1 )
points( max_locs, col="green", pch=19, cex=1 )
My interpretation is that the evidence (data) says there is probably no changepoint (a change at the beginning or end is no change) but there might be a change at intermediate data points.
We can see something strange (maybe a large single export?) happened at index 125 which translates to 2008MAY.
The mode at index 167 which translates to 2011NOV corresponds roughly to the EU / Korea trade agreement.
Let us assume that there really was a material difference in trade at this latter point. We can fit a linear regression before this point and one after this point.
Here’s the stan
data {
int<lower=1> N;
int<lower=1> K;
matrix[N,K] X;
vector[N] y;
}
parameters {
vector[K] beta;
real<lower=0> sigma;
}
model {
y ~ normal(X * beta, sigma);
}
And here’s the R to fit the before and after data. We fit the model, pull out the parameters for the regression and pull out the covariates
N < length(yM)
M < max_locs$x[3]
fite < stan(file = 'LR.stan',
data = list(N = M, K = ncol(XM), y = yM[1:M], X = XM[1:M,]),
pars=c("beta", "sigma"),
chains=3,
cores=3,
iter=3000,
warmup=1000,
refresh=1)
se < extract(fite, pars = c("beta", "sigma"), permuted=TRUE)
estCovParamsE < colMeans(se$beta)
fitl < stan(file = 'LR.stan',
data = list(N = NM, K = ncol(XM), y = yM[(M+1):N], X = XM[(M+1):N,]),
pars=c("beta", "sigma"),
chains=3,
cores=3,
iter=3000,
warmup=1000,
refresh=1)
sl < extract(fitl, pars = c("beta", "sigma"), permuted=TRUE)
estCovParamsL < colMeans(sl$beta)
Make predictions
linRegPredsE < data.matrix(XM) %*% estCovParamsE
linRegPredsL < data.matrix(XM) %*% estCovParamsL
ggplot(tabM, aes(x=as.Date(tabM$krLabs), y=tabM$kr)) +
geom_line(aes(x = as.Date(tabM$krLabs), y = tabM$kr, col = "Actual")) +
geom_line(data=tabM[1:M,], aes(x = as.Date(tabM$krLabs[1:M]), y = linRegPredsE[(1:M),1], col = "Fit (Before FTA)")) +
geom_line(data=tabM[(M+1):N,], aes(x = as.Date(tabM$krLabs[(M+1):N]), y = linRegPredsL[((M+1):N),1], col = "Fit (After FTA)")) +
theme(legend.position="bottom") +
ggtitle("Goods Exports UK / South Korea (Monthly)") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Date") +
ylab("Value (£m)")
An Intermediate Conclusion and Goods and Services (Pink Book)
So we didn’t manage to substantiate either the Chancellor’s claim or the Member of Parliament’s claim.
But it may be that we can if we look at Goods and Services then we might be able to see the numbers resulting in the claims.
pb < "datasets/pinkbook/current/pb.csv"
pbcsv < read.csv(paste(ukstats,"file?uri=",bop,pb,sep="/"),stringsAsFactors=FALSE)
This has a lot more information albeit only annually.
pbns < grep("Korea", names(pbcsv))
length(pbns)
## [1] 21
lapply(pbns,function(x) names(pbcsv[x]))
## [[1]]
## [1] "BoP..Current.Account..Goods...Services..Imports..South.Korea............"
##
## [[2]]
## [1] "BoP..Current.Account..Current.Transfer..Balance..South.Korea............"
##
## [[3]]
## [1] "BoP..Current.Account..Goods...Services..Balance..South.Korea............"
##
## [[4]]
## [1] "IIP..Assets..Total.South.Korea.........................................."
##
## [[5]]
## [1] "Trade.in.Services.replaces.1.A.B....Exports.Credits...South.Korea...nsa."
##
## [[6]]
## [1] "IIP...Liabilities...Total...South.Korea................................."
##
## [[7]]
## [1] "BoP..Total.income..Balance..South.Korea................................."
##
## [[8]]
## [1] "BoP..Total.income..Debits..South.Korea.................................."
##
## [[9]]
## [1] "BoP..Total.income..Credits..South.Korea................................."
##
## [[10]]
## [1] "BoP..Current.account..Balance..South.Korea.............................."
##
## [[11]]
## [1] "BoP..Current.account..Debits..South.Korea..............................."
##
## [[12]]
## [1] "BoP..Current.account..Credits..South.Korea.............................."
##
## [[13]]
## [1] "IIP...Net...Total....South.Korea........................................"
##
## [[14]]
## [1] "Trade.in.Services.replaces.1.A.B....Imports.Debits...South.Korea...nsa.."
##
## [[15]]
## [1] "BoP..Current.Account..Services..Total.Balance..South.Korea.............."
##
## [[16]]
## [1] "Bop.consistent..Balance..NSA..South.Korea..............................."
##
## [[17]]
## [1] "Bop.consistent..Im..NSA..South.Korea...................................."
##
## [[18]]
## [1] "Bop.consistent..Ex..NSA..South.Korea...................................."
##
## [[19]]
## [1] "Current.Transfers...Exports.Credits...South.Korea...nsa................."
##
## [[20]]
## [1] "Current.Transfers...Imports.Debits...South.Korea...nsa.................."
##
## [[21]]
## [1] "BoP..Current.Account..Goods...Services..Exports..South.Korea............"
Let’s just look at exports.
koreanpb < pbcsv[grepl("Korea", names(pbcsv))]
exportspb < koreanpb[grepl("Exports", names(koreanpb))]
names(exportspb)
## [1] "Trade.in.Services.replaces.1.A.B....Exports.Credits...South.Korea...nsa."
## [2] "Current.Transfers...Exports.Credits...South.Korea...nsa................."
## [3] "BoP..Current.Account..Goods...Services..Exports..South.Korea............"
The last column gives exports of Goods and Services so let’s draw a chart of it.
pb < data.frame(pbcsv[grepl("Title", names(pbcsv))],
exportspb[3])
colnames(pb) < c("Title", "Exports")
startpbY < which(grepl("1999",pb$Title))
endpbY < which(grepl("2015",pb$Title))
pbY < pb[startpbY:endpbY,]
tabpbY < data.frame(kr=as.numeric(pbY$Exports),
krLabs=as.numeric(pbY$Title))
ggplot(tabpbY, aes(x=tabpbY$krLabs, y=tabpbY$kr)) + geom_line() +
theme(legend.position="bottom") +
ggtitle("Goods and Services Exports UK / South Korea (Annual)") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Date") +
ylab("Value (£m)")
No joy here either to any of the claims. Still it’s been an interesting exercise.
Girsanov’s Theorem
Introduction
We previously used importance sampling in the case where we did not have a sampler available for the distribution from which we wished to sample. There is an even more compelling case for using importance sampling.
Suppose we wish to estimate the probability of a rare event. For example, suppose we wish to estimate where . In this case, we can look up the answer . But suppose we couldn’t look up the answer. One strategy that might occur to us is to sample and then estimate the probability by counting the number of times out of the total that the sample was bigger than 5. The flaw in this is obvious but let’s try it anyway.
> module Girsanov where
> import qualified Data.Vector as V
> import Data.Random.Source.PureMT
> import Data.Random
> import Control.Monad.State
> import Data.Histogram.Fill
> import Data.Histogram.Generic ( Histogram )
> import Data.Number.Erf
> import Data.List ( transpose )
> samples :: (Foldable f, MonadRandom m) =>
> (Int > RVar Double > RVar (f Double)) >
> Int >
> m (f Double)
> samples repM n = sample $ repM n $ stdNormal
> biggerThan5 :: Int
> biggerThan5 = length (evalState xs (pureMT 42))
> where
> xs :: MonadRandom m => m [Double]
> xs = liftM (filter (>= 5.0)) $ samples replicateM 100000
As we might have expected, even if we draw 100,000 samples, we estimate this probability quite poorly.
ghci> biggerThan5
0
Using importance sampling we can do a lot better.
Let be a normally distributed random variable with zero mean and unit variance under the Lebesgue measure . As usual we can then define a new probability measure, the law of , by
Thus
where we have defined
Thus we can estimate either by sampling from a normal distribution with mean 0 and counting the number of samples that are above 5 or we can sample from a normal distribution with mean 5 and calculating the appropriately weighted mean
Let’s try this out.
> biggerThan5' :: Double
> biggerThan5' = sum (evalState xs (pureMT 42)) / (fromIntegral n)
> where
> xs :: MonadRandom m => m [Double]
> xs = liftM (map g) $
> liftM (filter (>= 5.0)) $
> liftM (map (+5)) $
> samples replicateM n
> g x = exp $ (5^2 / 2)  5 * x
> n = 100000
And now we get quite a good estimate.
ghci> biggerThan5'
2.85776225450217e7
Random Paths
The probability of another rare event we might wish to estimate is that of Brownian Motion crossing a boundary. For example, what is the probability of Browian Motion crossing the line ? Let’s try sampling 100 paths (we restrict the number so the chart is still readable).
> epsilons :: (Foldable f, MonadRandom m) =>
> (Int > RVar Double > RVar (f Double)) >
> Double >
> Int >
> m (f Double)
> epsilons repM deltaT n = sample $ repM n $ rvar (Normal 0.0 (sqrt deltaT))
> bM0to1 :: Foldable f =>
> ((Double > Double > Double) > Double > f Double > f Double)
> > (Int > RVar Double > RVar (f Double))
> > Int
> > Int
> > f Double
> bM0to1 scan repM seed n =
> scan (+) 0.0 $
> evalState (epsilons repM (recip $ fromIntegral n) n) (pureMT (fromIntegral seed))
We can see by eye in the chart below that again we do quite poorly.
We know that where .
> p :: Double > Double > Double
> p a t = 2 * (1  normcdf (a / sqrt t))
ghci> p 1.0 1.0
0.31731050786291415
ghci> p 2.0 1.0
4.550026389635842e2
ghci> p 3.0 1.0
2.699796063260207e3
But what if we didn’t know this formula? Define
where is the measure which makes Brownian Motion.
We can estimate the expectation of
where is 1 if Brownian Motion hits the barrier and 0 otherwise and M is the total number of simulations. We know from visual inspection that this gives poor results but let us try some calculations anyway.
> n = 500
> m = 10000
> supAbove :: Double > Double
> supAbove a = fromIntegral count / fromIntegral n
> where
> count = length $
> filter (>= a) $
> map (\seed > maximum $ bM0to1 scanl replicateM seed m) [0..n  1]
> bM0to1WithDrift seed mu n =
> zipWith (\m x > x + mu * m * deltaT) [0..] $
> bM0to1 scanl replicateM seed n
> where
> deltaT = recip $ fromIntegral n
ghci> supAbove 1.0
0.326
ghci> supAbove 2.0
7.0e2
ghci> supAbove 3.0
0.0
As expected for a rare event we get an estimate of 0.
Fortunately we can use importance sampling for paths. If we take where is a constant in Girsanov’s Theorem below then we know that is Brownian Motion.
We observe that
So we can also estimate the expectation of under as
where is now 1 if Brownian Motion with the specified drift hits the barrier and 0 otherwise, and is Brownian Motion sampled at .
We can see from the chart below that this is going to be better at hitting the required barrier.
Let’s do some calculations.
> supAbove' a = (sum $ zipWith (*) ns ws) / fromIntegral n
> where
> deltaT = recip $ fromIntegral m
>
> uss = map (\seed > bM0to1 scanl replicateM seed m) [0..n  1]
> ys = map last uss
> ws = map (\x > exp (a * x  0.5 * a^2)) ys
>
> vss = map (zipWith (\m x > x + a * m * deltaT) [0..]) uss
> sups = map maximum vss
> ns = map fromIntegral $ map fromEnum $ map (>=a) sups
ghci> supAbove' 1.0
0.31592655955519156
ghci> supAbove' 2.0
4.999395029856741e2
ghci> supAbove' 3.0
2.3859203473651654e3
The reader is invited to try the above estimates with 1,000 samples per path to see that even with this respectable number, the calculation goes awry.
In General
If we have a probability space and a nonnegative random variable with then we can define a new probability measure on the same algebra by
For any two probability measures when such a exists, it is called the RadonNikodym derivative of with respect to and denoted
Given that we managed to shift a Normal Distribution with nonzero mean in one measure to a Normal Distribution with another mean in another measure by producing the RadonNikodym derivative, might it be possible to shift, Brownian Motion with a drift under a one probability measure to be pure Brownian Motion under another probability measure by producing the RadonNikodym derivative? The answer is yes as Girsanov’s theorem below shows.
Girsanov’s Theorem
Let be Brownian Motion on a probability space and let be a filtration for this Brownian Motion and let be an adapted process such that the Novikov Sufficiency Condition holds
then there exists a probability measure such that

is equivalent to , that is, .

.

is Brownian Motion on the probabiity space also with the filtration .
In order to prove Girsanov’s Theorem, we need a condition which allows to infer that is a strict martingale. One such useful condition to which we have already alluded is the Novikov Sufficiency Condition.
Proof
Define by
Then, temporarily overloading the notation and writing for expectation under , and applying the Novikov Sufficiency Condition to , we have
From whence we see that
And since this characterizes Brownian Motion, we are done.
The Novikov Sufficiency Condition
The Novikov Sufficiency Condition Statement
Let and further let it satisfy the Novikov condition
then the process defined by
is a strict martingale.
Before we prove this, we need two lemmas.
Lemma 1
Let for be a nonnegative local martingale then is a supermartingale and if further then is a strict martingale.
Proof
Let be a localizing sequence for then for and using Fatou’s lemma and the fact that the stopped process is a strict martingale
Thus is a supermartingale and therefore
By assumption we have thus is a strict martingale.
Lemma 2
Let be a nonnegative local martingale. If is a localizing sequence such that for some then is a strict martingale.
Proof
By the supermartingale property and thus by dominated convergence we have that
We also have that
By Chebyshev’s inequality (see note (2) below), we have
Taking limits first over and then over we see that
For and we have . Thus
Again taking limits over and then over we have
These two conclusions imply
We can therefore conclude (since is a martingale)
Thus by the preceeding lemma is a strict as well as a local martingale.
The Novikov Sufficiency Condition Proof
Step 1
First we note that is a local martingale for . Let us show that it is a strict martingale. We can do this if for any localizing sequence we can show
using the preceeding lemma where .
We note that
Now apply Hölder’s inequality with conjugates and where is chosen to ensure that the conjugates are both strictly greater than 1 (otherwise we cannot apply the inequality).
Now let us choose
then
In order to apply Hölder’s inequality we need to check that and that but this amounts to checking that and that . We also need to check that but this amounts to checking that for and this is easily checked to be true.
Rewriting the above inequality with this value of we have
By the first lemma, since is a nonnegative local martingale, it is also a supermartingale. Furthermore . Thus
and therefore
Step 2
Recall we have
Taking logs gives
or in diferential form
We can also apply Itô’s rule to
where denotes the quadratic variation of a stochastic process.
Comparing terms gives the stochastic differential equation
In integral form this can also be written as
Thus is a local martingale (it is defined by a stochastic differential equation) and by the first lemma it is a supermaringale. Hence .
Next we note that
to which we can apply Hölder’s inequality with conjugates to obtain
Applying the supermartingale inequality then gives
Step 3
Now we can apply the result in Step 2 to the result in Step 1.
We can replace by for any stopping time . Thus for a localizing sequence we have
From which we can conclude
Now we can apply the second lemma to conclude that is a strict martingale.
Final Step
We have already calculated that
Now apply Hölder’s inequality with conjugates and .
And then we can apply Jensen’s inequality to the last term on the right hand side with the convex function .
Using the inequality we established in Step 2 and the Novikov condition then gives
If we now let we see that we must have . We already now that by the first lemma and so we have finally proved that is a martingale.
Notes

We have already used importance sampling and also touched on changes of measure.

Chebyshev’s inequality is usually stated for the second moment but the proof is easily adapted:
 We follow Handel (2007); a similar approach is given in Steele (2001).
Bibliography
Handel, Ramon von. 2007. “Stochastic Calculus, Filtering, and Stochastic Control (Lecture Notes).”
Steele, J.M. 2001. Stochastic Calculus and Financial Applications. Applications of Mathematics. Springer New York. https://books.google.co.uk/books?id=fsgkBAAAQBAJ.
Conditional Expectation under Change of Measure
Theorem
Let and be measures on with , a sub algebra and an integrable random variable () then
Proof
Thus
Hence
We note that
is measurable (it is the result of a projection) and that
Hence
as required.
Some Background on Hidden Markov Models
Introduction
If you look at the wikipedia article on Hidden Markov Models (HMMs) then you might be forgiven for concluding that these deal only with discrete time and finite state spaces. In fact, HMMs are much more general. Furthermore, a better understanding of such models can be helped by putting them into context. Before actually specifying what an HMM is, let us review something of Markov processes. A subsequent blog post will cover HMMs themselves.
Markov Process and Chains
Recall that a transition kernel is a mapping where and are two measurable spaces such that is a probability measure on for all and such that is a measurable function on for all .
For example, we could have and and . Hopefully this should remind you of the transition matrix of a Markov chain.
Recall further that a family of such transitions where is some index set satisfying
gives rise to a Markov process (under some mild conditions — see Rogers and Williams (2000) and Kallenberg (2002) for much more detail), that is, a process in which what happens next only depends on where the process is now and not how it got there.
Let us carry on with our example and take . With a slight abuse of notation and since is finite we can rewrite the integral as a sum
which we recognise as a restatement of how Markov transition matrices combine.
Some Examples
A Fully Deterministic System
A deterministic system can be formulated as a Markov process with a particularly simple transition kernel given by
where is the deterministic state update function (the flow) and is the Dirac delta function.
Parameters
Let us suppose that the determinstic system is dependent on some timevarying values for which we we are unable or unwish to specify a deterministic model. For example, we may be considering predatorprey model where the parameters cannot explain every aspect. We could augment the deterministic kernel in the previous example with
where we use Greek letters for the parameters (and Roman letters for state) and we use e.g. to indicate probability densities. In other words that the parameters tend to wiggle around like Brown’s pollen particles rather than remaining absolutely fixed.
OrnsteinUhlenbeck
Of course Brownian motion or diffusion may not be a good model for our parameters; with Brownian motion, the parameters could drift off to . We might believe that our parameters tend to stay close to some given value (meanreverting) and use the OrnsteinUhlenbeck kernel.
where expresses how strongly we expect the parameter to respond to perturbations, is the mean to which the process wants to revert (aka the asymptotic mean) and expresses how noisy the process is.
It is sometimes easier to view these transition kernels in terms of stochastic differential equations. Brownian motion can be expressed as
and OrnsteinUhlenbeck can be expressed as
where is the Wiener process.
Let us check that the latter stochastic differential equation gives the stated kernel. Rewriting it in integral form and without loss of generality taking
Since the integral is of a deterministic function, the distribution of is normal. Thus we need only calculate the mean and variance.
The mean is straightforward.
Without loss of generality assume and writing for covariance
And now we can use Ito and independence
Substituting in gives the desired result.
Bibliography
Kallenberg, O. 2002. Foundations of Modern Probability. Probability and Its Applications. Springer New York. http://books.google.co.uk/books?id=TBgFslMy8V4C.
Rogers, L. C. G., and David Williams. 2000. Diffusions, Markov Processes, and Martingales. Vol. 1. Cambridge Mathematical Library. Cambridge: Cambridge University Press.
A Type Safe Reverse or Some Hasochism
Conor McBride was not joking when he and his coauthor entitled their paper about dependent typing in Haskell “Hasochism”: Lindley and McBride (2013).
In trying to resurrect the Haskell package yarr, it seemed that a dependently typed reverse function needed to be written. Writing such a function turns out to be far from straightforward. How GHC determines that a proof (program) discharges a proposition (type signature) is rather opaque and perhaps not surprisingly the error messages one gets if the proof is incorrect are far from easy to interpret.
I’d like to thank all the folk on StackOverflow whose answers and comments I have used freely below. Needless to say, any errors are entirely mine.
Here are two implementations, each starting from different axioms (NB: I have never seen type families referred to as axioms but it seems helpful to think about them in this way).
> {# OPTIONS_GHC Wall #}
> {# OPTIONS_GHC fnowarnnameshadowing #}
> {# OPTIONS_GHC fnowarntypedefaults #}
> {# OPTIONS_GHC fnowarnunuseddobind #}
> {# OPTIONS_GHC fnowarnmissingmethods #}
> {# OPTIONS_GHC fnowarnorphans #}
> {# LANGUAGE GADTs #}
> {# LANGUAGE KindSignatures #}
> {# LANGUAGE DataKinds #}
> {# LANGUAGE TypeFamilies #}
> {# LANGUAGE UndecidableInstances #}
> {# LANGUAGE ExplicitForAll #}
> {# LANGUAGE TypeOperators #}
> {# LANGUAGE ScopedTypeVariables #}
For both implementations, we need propositional equality: if a :~: b
is inhabited by some terminating value, then the type a
is the same as the type b
. Further we need the normal form of an equality proof: Refl :: a :~: a
and a function, gcastWith
which allows us to use internal equality (:~:)
to discharge a required proof of external equality (~)
. Readers familiar with topos theory, for example see Lambek and Scott (1988), will note that the notation is reversed.
> import Data.Type.Equality ( (:~:) (Refl), gcastWith )
For the second of the two approaches adumbrated we will need
> import Data.Proxy
The usual natural numbers:
> data Nat = Z  S Nat
We need some axioms:
 A left unit and
 Restricted commutativity.
> type family (n :: Nat) :+ (m :: Nat) :: Nat where
> Z :+ m = m
> S n :+ m = n :+ S m
We need the usual singleton for Nat
to tie types and terms together.
> data SNat :: Nat > * where
> SZero :: SNat Z
> SSucc :: SNat n > SNat (S n)
Now we can prove some lemmas.
First a lemma showing we can push :+
inside a successor, S
.
> succ_plus_id :: SNat n1 > SNat n2 > (((S n1) :+ n2) :~: (S (n1 :+ n2)))
> succ_plus_id SZero _ = Refl
> succ_plus_id (SSucc n) m = gcastWith (succ_plus_id n (SSucc m)) Refl
This looks nothing like a standard mathematical proof and it’s hard to know what ghc is doing under the covers but presumably something like this:
 For
SZero
S Z :+ n2 = Z :+ S n2
(by axiom 2) =S n2
(by axiom 1) andS (Z + n2) = S n2
(by axiom 1) So
S Z :+ n2 = S (Z + n2)
 For
SSucc
SSucc n :: SNat (S k)
son :: SNat k
andm :: SNat i
soSSucc m :: SNat (S i)
succ_plus id n (SSucc m) :: k ~ S p => S p :+ S i :~: S (p :+ S i)
(by hypothesis)k ~ S p => S p :+ S i :~: S (S p :+ i)
(by axiom 2)k :+ S i :~: S (k :+ i)
(by substitution)S k :+ i :~: S (k :+ i)
(by axiom 2)
Second a lemma showing that Z
is also the right unit.
> plus_id_r :: SNat n > ((n :+ Z) :~: n)
> plus_id_r SZero = Refl
> plus_id_r (SSucc n) = gcastWith (plus_id_r n) (succ_plus_id n SZero)
 For
SZero
Z :+ Z = Z
(by axiom 1)
 For
SSucc
SSucc n :: SNat (S k)
son :: SNat k
plus_id_r n :: k :+ Z :~: k
(by hypothesis)succ_plus_id n SZero :: S k :+ Z :~: S (k + Z)
(by thesucc_plus_id
lemma)succ_plus_id n SZero :: k :+ Z ~ k => S k :+ Z :~: S k
(by substitution)plus_id_r n :: k :+ Z :~: k
(by equation 2)
Now we can defined vectors which have their lengths encoded in their type.
> infixr 4 :::
> data Vec a n where
> Nil :: Vec a Z
> (:::) :: a > Vec a n > Vec a (S n)
We can prove a simple result using the lemma that Z
is a right unit.
> elim0 :: SNat n > (Vec a (n :+ Z) > Vec a n)
> elim0 n x = gcastWith (plus_id_r n) x
Armed with this we can write an reverse function in which the length of the result is guaranteed to be the same as the length of the argument.
> size :: Vec a n > SNat n
> size Nil = SZero
> size (_ ::: xs) = SSucc $ size xs
> accrev :: Vec a n > Vec a n
> accrev x = elim0 (size x) $ go Nil x where
> go :: Vec a m > Vec a n > Vec a (n :+ m)
> go acc Nil = acc
> go acc (x ::: xs) = go (x ::: acc) xs
> toList :: Vec a n > [a]
> toList Nil = []
> toList (x ::: xs) = x : toList xs
> test0 :: [String]
> test0 = toList $ accrev $ "a" ::: "b" ::: "c" ::: Nil
ghci> test0
["c","b","a"]
For an alternative approach, let us change the axioms slightly.
> type family (n1 :: Nat) + (n2 :: Nat) :: Nat where
> Z + n2 = n2
> (S n1) + n2 = S (n1 + n2)
Now the proof that Z
is a right unit is more straightforward.
> plus_id_r1 :: SNat n > ((n + Z) :~: n)
> plus_id_r1 SZero = Refl
> plus_id_r1 (SSucc n) = gcastWith (plus_id_r1 n) Refl
For the lemma showing we can push +
inside a successor, S
, we can use a Proxy
.
> plus_succ_r1 :: SNat n1 > Proxy n2 > ((n1 + (S n2)) :~: (S (n1 + n2)))
> plus_succ_r1 SZero _ = Refl
> plus_succ_r1 (SSucc n1) proxy_n2 = gcastWith (plus_succ_r1 n1 proxy_n2) Refl
Now we can write our reverse function without having to calculate sizes.
> accrev1 :: Vec a n > Vec a n
> accrev1 Nil = Nil
> accrev1 list = go SZero Nil list
> where
> go :: SNat n1 > Vec a n1 > Vec a n2 > Vec a (n1 + n2)
> go snat acc Nil = gcastWith (plus_id_r1 snat) acc
> go snat acc (h ::: (t :: Vec a n3)) =
> gcastWith (plus_succ_r1 snat (Proxy :: Proxy n3))
> (go (SSucc snat) (h ::: acc) t)
> test1 :: [String]
> test1 = toList $ accrev1 $ "a" ::: "b" ::: "c" ::: Nil
ghci> test0
["c","b","a"]
Bibliography
Lambek, J., and P.J. Scott. 1988. Introduction to HigherOrder Categorical Logic. Cambridge Studies in Advanced Mathematics. Cambridge University Press. http://books.google.co.uk/books?id=6PY_emBeGjUC.
Lindley, Sam, and Conor McBride. 2013. “Hasochism: The Pleasure and Pain of Dependently Typed Haskell Programming.” In Proceedings of the 2013 ACM SIGPLAN Symposium on Haskell, 81–92. Haskell ’13. New York, NY, USA: ACM. doi:10.1145/2503778.2503786.