# Introduction

Recall from the previous post that the Hare growth parameter undergoes Brownian motion so that the further into the future we go, the less certain we are about it. In order to ensure that this parameter remains positive, let’s model the log of it to be Brownian motion.

\displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 \bigg(1 - \frac{N_1}{K_1}\bigg) - c_1 N_1 N_2 \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & -\rho_2 N_2 \bigg(1 + \frac{N_2}{K_2}\bigg) + c_2 N_1 N_2 \\ \mathrm{d} \rho_1 & = & \rho_1 \sigma_{\rho_1} \mathrm{d}W_t \end{aligned}

where the final equation is a stochastic differential equation with $W_t$ being a Wiener process.

By Itô we have

$\displaystyle \mathrm{d} (\log{\rho_1}) = - \frac{\sigma_{\rho_1}^2}{2} \mathrm{d} t + \sigma_{\rho_1} \mathrm{d}W_t$

Again, we see that the populations become noisier the further into the future we go.

# Inference

Now let us infer the growth rate using Hamiltonian Monte Carlo and the domain specific probabilistic language Stan (Stan Development Team (2015b), Stan Development Team (2015a), Hoffman and Gelman (2014), Carpenter (2015)). Here’s the model expressed in Stan.

functions {
real f1 (real a, real k1, real b, real p, real z) {
real q;

q = a * p * (1 - p / k1) - b * p * z;
return q;
}

real f2 (real d, real k2, real c, real p, real z) {
real q;

q = -d * z * (1 + z / k2) + c * p * z;
return q;
}
}

data {
int<lower=1> T;   // Number of observations
real y[T];        // Observed hares
real k1;          // Hare carrying capacity
real b;           // Hare death rate per lynx
real d;           // Lynx death rate
real k2;          // Lynx carrying capacity
real c;           // Lynx birth rate per hare
real deltaT;      // Time step
}

parameters {
real<lower=0> mu;
real<lower=0> sigma;
real<lower=0> p0;
real<lower=0> z0;
real<lower=0> rho0;
real w[T];
}

transformed parameters {
real<lower=0> p[T];
real<lower=0> z[T];
real<lower=0> rho[T];

p[1] = p0;
z[1] = z0;
rho[1] = rho0;

for (t in 1:(T-1)){
p[t+1] = p[t] + deltaT * f1 (exp(rho[t]), k1, b, p[t], z[t]);
z[t+1] = z[t] + deltaT * f2 (d, k2, c, p[t], z[t]);

rho[t+1] = rho[t] * exp(sigma * sqrt(deltaT) * w[t] - 0.5 * sigma * sigma * deltaT);
}
}

model {
mu    ~ uniform(0.0,1.0);
sigma ~ uniform(0.0, 0.5);
p0    ~ lognormal(log(100.0), 0.2);
z0    ~ lognormal(log(50.0), 0.1);
rho0  ~ normal(log(mu), sigma);
w     ~ normal(0.0,1.0);

for (t in 1:T) {
y[t] ~ lognormal(log(p[t]),0.1);
}
}

Let’s look at the posteriors of the hyper-parameters for the Hare growth parameter.

Again, the estimate for $\mu$ is pretty decent. For our generated data, $\sigma =0$ and given our observations are quite noisy maybe the estimate for this is not too bad also.

# Appendix: The R Driving Code

install.packages("devtools")
library(devtools)
install_github("libbi/RBi",ref="master")
install_github("sbfnk/RBi.helpers",ref="master")

rm(list = ls(all.names=TRUE))

library('RBi')
try(detach(package:RBi, unload = TRUE), silent = TRUE)
library(RBi, quietly = TRUE)

library('RBi.helpers')

library('ggplot2', quietly = TRUE)
library('gridExtra', quietly = TRUE)

endTime <- 50

PP <- bi_model("PP.bi")
synthetic_dataset_PP <- bi_generate_dataset(endtime=endTime,
model=PP,
seed="42",
verbose=TRUE,
noutputs=500))

df <- data.frame(rdata_PP$P$nr,
rdata_PP$P$value,
rdata_PP$Z$value,
rdata_PP$P_obs$value)

ggplot(df, aes(rdata_PP$P$nr, y = Population, color = variable), size = 0.1) +
geom_line(aes(y = rdata_PP$P$value, col = "Hare"), size = 0.1) +
geom_line(aes(y = rdata_PP$Z$value, col = "Lynx"), size = 0.1) +
geom_point(aes(y = rdata_PP$P_obs$value, col = "Observations"), size = 0.1) +
theme(legend.position="none") +
ggtitle("Example Data") +
xlab("Days") +
theme(axis.text=element_text(size=4),
axis.title=element_text(size=6,face="bold")) +
theme(plot.title = element_text(size=10))
ggsave(filename="diagrams/LVdata.png",width=4,height=3)

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

lvStanModel <- stan_model(file = "SHO.stan",verbose=TRUE)

lvFit <- sampling(lvStanModel,
seed=42,
data=list(T = length(rdata_PP$P_obs$value),
y = rdata_PP$P_obs$value,
k1 = 2.0e2,
b  = 2.0e-2,
d  = 4.0e-1,
k2 = 2.0e1,
c  = 4.0e-3,
deltaT = rdata_PP$P_obs$time[2] - rdata_PP$P_obs$time[1]
),
chains=1)

samples <- extract(lvFit)

gs1 <- qplot(x = samples$mu, y = ..density.., geom = "histogram") + xlab(expression(\mu)) gs2 <- qplot(x = samples$sigma, y = ..density.., geom = "histogram") + xlab(expression(samples$sigma)) gs3 <- grid.arrange(gs1, gs2) ggsave(plot=gs3,filename="diagrams/LvPosteriorStan.png",width=4,height=3) synthetic_dataset_PP1 <- bi_generate_dataset(endtime=endTime, model=PP, init = list(P = 100, Z=50), seed="42", verbose=TRUE, add_options = list( noutputs=500)) rdata_PP1 <- bi_read(synthetic_dataset_PP1) synthetic_dataset_PP2 <- bi_generate_dataset(endtime=endTime, model=PP, init = list(P = 150, Z=25), seed="42", verbose=TRUE, add_options = list( noutputs=500)) rdata_PP2 <- bi_read(synthetic_dataset_PP2) df1 <- data.frame(hare = rdata_PP$P$value, lynx = rdata_PP$Z$value, hare1 = rdata_PP1$P$value, lynx1 = rdata_PP1$Z$value, hare2 = rdata_PP2$P$value, lynx2 = rdata_PP2$Z$value) ggplot(df1) + geom_path(aes(x=df1$hare,  y=df1$lynx, col = "0"), size = 0.1) + geom_path(aes(x=df1$hare1, y=df1$lynx1, col = "1"), size = 0.1) + geom_path(aes(x=df1$hare2, y=df1$lynx2, col = "2"), size = 0.1) + theme(legend.position="none") + ggtitle("Phase Space") + xlab("Hare") + ylab("Lynx") + theme(axis.text=element_text(size=4), axis.title=element_text(size=6,face="bold")) + theme(plot.title = element_text(size=10)) ggsave(filename="diagrams/PPviaLibBi.png",width=4,height=3) PPInfer <- bi_model("PPInfer.bi") bi_object_PP <- libbi(client="sample", model=PPInfer, obs = synthetic_dataset_PP) bi_object_PP$run(add_options = list(
"end-time" = endTime,
noutputs = endTime,
nsamples = 2000,
nparticles = 128,
seed=42,
verbose = TRUE,
stdoutput_file_name = tempfile(pattern="pmmhoutput", fileext=".txt"))

bi_file_summary(bi_object_PP$result$output_file_name)

mu <- bi_read(bi_object_PP, "mu")$value g1 <- qplot(x = mu[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(mu)) sigma <- bi_read(bi_object_PP, "sigma")$value
g2 <- qplot(x = sigma[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(sigma))
g3 <- grid.arrange(g1, g2)
ggsave(plot=g3,filename="diagrams/LvPosterior.png",width=4,height=3)

df2 <- data.frame(hareActs = rdata_PP$P$value,
hareObs  = rdata_PP$P_obs$value)

ggplot(df, aes(rdata_PP$P$nr, y = value, color = variable)) +
geom_line(aes(y = rdata_PP$P$value, col = "Phyto")) +
geom_line(aes(y = rdata_PP$Z$value, col = "Zoo")) +
geom_point(aes(y = rdata_PP$P_obs$value, col = "Phyto Obs"))

ln_alpha <- bi_read(bi_object_PP, "ln_alpha")$value P <- matrix(bi_read(bi_object_PP, "P")$value,nrow=51,byrow=TRUE)
Z <- matrix(bi_read(bi_object_PP, "Z")$value,nrow=51,byrow=TRUE) data50 <- bi_generate_dataset(endtime=endTime, model=PP, seed="42", verbose=TRUE, add_options = list( noutputs=50)) rdata50 <- bi_read(data50) df3 <- data.frame(days = c(1:51), hares = rowMeans(P), lynxes = rowMeans(Z), actHs = rdata50$P$value, actLs = rdata50$Zvalue) ggplot(df3) + geom_line(aes(x = days, y = hares, col = "Est Phyto")) + geom_line(aes(x = days, y = lynxes, col = "Est Zoo")) + geom_line(aes(x = days, y = actHs, col = "Act Phyto")) + geom_line(aes(x = days, y = actLs, col = "Act Zoo")) # Bibliography Carpenter, Bob. 2015. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software. Hoffman, Matthew D., and Andrew Gelman. 2014. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research. Stan Development Team. 2015a. Stan Modeling Language User’s Guide and Reference Manual, Version 2.10.0. http://mc-stan.org/. ———. 2015b. “Stan: A C++ Library for Probability and Sampling, Version 2.10.0.” http://mc-stan.org/. # Ecology, Dynamical Systems and Inference via PMMH # Introduction In the 1920s, Lotka (1909) and Volterra (1926) developed a model of a very simple predator-prey ecosystem. \displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 - c_1 N_1 N_2 \label{eq2a} \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & c_2 N_1 N_2 - \rho_2 N2 \label{eq2b} \end{aligned} Although simple, it turns out that the Canadian lynx and showshoe hare are well represented by such a model. Furthermore, the Hudson Bay Company kept records of how many pelts of each species were trapped for almost a century, giving a good proxy of the population of each species. We can capture the fact that we do not have a complete model by describing our state of ignorance about the parameters. In order to keep this as simple as possible let us assume that log parameters undergo Brownian motion. That is, we know the parameters will jiggle around and the further into the future we look the less certain we are about what values they will have taken. By making the log parameters undergo Brownian motion, we can also capture our modelling assumption that birth, death and predation rates are always positive. A similar approach is taken in Dureau, Kalogeropoulos, and Baguelin (2013) where the (log) parameters of an epidemiological model are taken to be Ornstein-Uhlenbeck processes (which is biologically more plausible although adds to the complexity of the model, something we wish to avoid in an example such as this). Andrieu, Doucet, and Holenstein (2010) propose a method to estimate the parameters of such models (Particle Marginal Metropolis Hastings aka PMMH) and the domain specific probabilistic language LibBi (Murray (n.d.)) can be used to apply this (and other inference methods). For the sake of simplicity, in this blog post, we only model one parameter as being unknown and undergoing Brownian motion. A future blog post will consider more sophisticated scenarios. # A Dynamical System Aside The above dynamical system is structurally unstable (more on this in a future post), a possible indication that it should not be considered as a good model of predator–prey interaction. Let us modify this to include carrying capacities for the populations of both species. \displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 \bigg(1 - \frac{N_1}{K_1}\bigg) - c_1 N_1 N_2 \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & -\rho_2 N_2 \bigg(1 + \frac{N_2}{K_2}\bigg) + c_2 N_1 N_2 \end{aligned} # Data Generation with LibBi Let’s generate some data using LibBi. // Generate data assuming a fixed growth rate for hares rather than // e.g. a growth rate that undergoes Brownian motion. model PP { const h = 0.1; // time step const delta_abs = 1.0e-3; // absolute error tolerance const delta_rel = 1.0e-6; // relative error tolerance const a = 5.0e-1 // Hare growth rate const k1 = 2.0e2 // Hare carrying capacity const b = 2.0e-2 // Hare death rate per lynx const d = 4.0e-1 // Lynx death rate const k2 = 2.0e1 // Lynx carrying capacity const c = 4.0e-3 // Lynx birth rate per hare state P, Z // Hares and lynxes state ln_alpha // Hare growth rate - we express it in log form for // consistency with the inference model obs P_obs // Observations of hares sub initial { P ~ log_normal(log(100.0), 0.2) Z ~ log_normal(log(50.0), 0.1) } sub transition(delta = h) { ode(h = h, atoler = delta_abs, rtoler = delta_rel, alg = 'RK4(3)') { dP/dt = a * P * (1 - P / k1) - b * P * Z dZ/dt = -d * Z * (1 + Z / k2) + c * P * Z } } sub observation { P_obs ~ log_normal(log(P), 0.1) } } We can look at phase space starting with different populations and see they all converge to the same fixed point. # Data Generation with Haskell Since at some point in the future, I plan to produce Haskell versions of the methods given in Andrieu, Doucet, and Holenstein (2010), let’s generate the data using Haskell. > {-# OPTIONS_GHC -Wall #-} > {-# OPTIONS_GHC -fno-warn-name-shadowing #-}  > module LotkaVolterra ( > solLv > , solPp > , h0 > , l0 > , baz > , logBM > , eulerEx > )where  > import Numeric.GSL.ODE > import Numeric.LinearAlgebra  > import Data.Random.Source.PureMT > import Data.Random hiding ( gamma ) > import Control.Monad.State  Here’s the unstable model. > lvOde :: Double -> > Double -> > Double -> > Double -> > Double -> > [Double] -> > [Double] > lvOde rho1 c1 rho2 c2 _t [h, l] = > [ > rho1 * h - c1 * h * l > , c2 * h * l - rho2 * l > ] > lvOde _rho1 _c1 _rho2 _c2 _t vars = > error "lvOde called with: " ++ show (length vars) ++ " variable"

> rho1, c1, rho2, c2 :: Double
> rho1 = 0.5
> c1 = 0.02
> rho2 = 0.4
> c2 = 0.004

> deltaT :: Double
> deltaT = 0.1

> solLv :: Matrix Double
> solLv = odeSolve (lvOde rho1 c1 rho2 c2)
>                  [50.0, 50.0]
>                  (fromList [0.0, deltaT .. 50])


And here’s the stable model.

> ppOde :: Double ->
>          Double ->
>          Double ->
>          Double ->
>          Double ->
>          Double ->
>          Double ->
>          [Double] ->
>          [Double]
> ppOde a k1 b d k2 c _t [p, z] =
>   [
>     a * p * (1 - p / k1) - b * p * z
>   , -d * z * (1 + z / k2) + c * p * z
>   ]
> ppOde _a _k1 _b _d _k2 _c _t vars =
>   error "ppOde called with: " ++ show (length vars) ++ " variable"  > a, k1, b, d, k2, c :: Double > a = 0.5 > k1 = 200.0 > b = 0.02 > d = 0.4 > k2 = 50.0 > c = 0.004  > solPp :: Double -> Double -> Matrix Double > solPp x y = odeSolve (ppOde a k1 b d k2 c) > [x, y] > (fromList [0.0, deltaT .. 50])  > gamma, alpha, beta :: Double > gamma = d / a > alpha = a / (c * k1) > beta = d / (a * k2)  > fp :: (Double, Double) > fp = ((gamma + beta) / (1 + alpha * beta), (1 - gamma * alpha) / (1 + alpha * beta))  > h0, l0 :: Double > h0 = a * fst fp / c > l0 = a * snd fp / b  > foo, bar :: Matrix R > foo = matrix 2 [a / k1, b, c, negate d / k2] > bar = matrix 1 [a, d]  > baz :: Maybe (Matrix R) > baz = linearSolve foo bar  This gives a stable fixed point of ghci> baz Just (2><1) [ 120.00000000000001 , 10.0 ]  Here’s an example of convergence to that fixed point in phase space. ## The Stochastic Model Let us now assume that the Hare growth parameter undergoes Brownian motion so that the further into the future we go, the less certain we are about it. In order to ensure that this parameter remains positive, let’s model the log of it to be Brownian motion. \displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 \bigg(1 - \frac{N_1}{K_1}\bigg) - c_1 N_1 N_2 \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & -\rho_2 N_2 \bigg(1 + \frac{N_2}{K_2}\bigg) + c_2 N_1 N_2 \\ \mathrm{d} \rho_1 & = & \rho_1 \sigma_{\rho_1} \mathrm{d}W_t \end{aligned} where the final equation is a stochastic differential equation with $W_t$ being a Wiener process. By Itô we have $\displaystyle \mathrm{d} (\log{\rho_1}) = - \frac{\sigma_{\rho_1}^2}{2} \mathrm{d} t + \sigma_{\rho_1} \mathrm{d}W_t$ We can use this to generate paths for $\rho_1$. $\displaystyle \rho_1(t + \Delta t) = \rho_1(t)\exp{\bigg(- \frac{\sigma_{\rho_1}^2}{2} \Delta t + \sigma_{\rho_1} \sqrt{\Delta t} Z\bigg)}$ where $Z \sim {\mathcal{N}}(0,1)$. > oneStepLogBM :: MonadRandom m => Double -> Double -> Double -> m Double > oneStepLogBM deltaT sigma rhoPrev = do > x <- sample rvar StdNormal
>   return $rhoPrev * exp(sigma * (sqrt deltaT) * x - 0.5 * sigma * sigma * deltaT)  > iterateM :: Monad m => (a -> m a) -> m a -> Int -> m [a] > iterateM f mx n = sequence . take n . iterate (>>= f)$ mx

> logBMM :: MonadRandom m => Double -> Double -> Int -> Int -> m [Double]
> logBMM initRho sigma n m =
>   iterateM (oneStepLogBM (recip $fromIntegral n) sigma) (return initRho) (n * m)  > logBM :: Double -> Double -> Int -> Int -> Int -> [Double] > logBM initRho sigma n m seed = > evalState (logBMM initRho sigma n m) (pureMT$ fromIntegral seed)


We can see the further we go into the future the less certain we are about the value of the parameter.

Using this we can simulate the whole dynamical system which is now a stochastic process.

> f1, f2 :: Double -> Double -> Double ->
>           Double -> Double ->
>           Double
> f1 a k1 b p z = a * p * (1 - p / k1) - b * p * z
> f2 d k2 c p z = -d * z * (1 + z / k2) + c * p * z

> oneStepEuler :: MonadRandom m =>
>                 Double ->
>                 Double ->
>                 Double -> Double ->
>                 Double -> Double -> Double ->
>                 (Double, Double, Double) ->
>                 m (Double, Double, Double)
> oneStepEuler deltaT sigma k1 b d k2 c (rho1Prev, pPrev, zPrev) = do
>     let pNew = pPrev + deltaT * f1 (exp rho1Prev) k1 b pPrev zPrev
>     let zNew = zPrev + deltaT * f2 d k2 c pPrev zPrev
>     rho1New <- oneStepLogBM deltaT sigma rho1Prev
>     return (rho1New, pNew, zNew)

> euler :: MonadRandom m =>
>          (Double, Double, Double) ->
>          Double ->
>          Double -> Double ->
>          Double -> Double -> Double ->
>          Int -> Int ->
>          m [(Double, Double, Double)]
> euler stateInit sigma k1 b d k2 c n m =
>   iterateM (oneStepEuler (recip $fromIntegral n) sigma k1 b d k2 c) > (return stateInit) > (n * m)  > eulerEx :: (Double, Double, Double) -> > Double -> Int -> Int -> Int -> > [(Double, Double, Double)] > eulerEx stateInit sigma n m seed = > evalState (euler stateInit sigma k1 b d k2 c n m) (pureMT$ fromIntegral seed)


We see that the populations become noisier the further into the future we go.

Notice that the second order effects of the system are now to some extent captured by the fact that the growth rate of Hares can drift. In our simulation, this is demonstrated by our decreasing lack of knowledge the further we look into the future.

# Inference

Now let us infer the growth rate using PMMH. Here’s the model expressed in LibBi.

// Infer growth rate for hares

model PP {
const h         = 0.1;    // time step
const delta_abs = 1.0e-3; // absolute error tolerance
const delta_rel = 1.0e-6; // relative error tolerance

const a  = 5.0e-1 // Hare growth rate - superfluous for inference
// but a reminder of what we should expect
const k1 = 2.0e2  // Hare carrying capacity
const b  = 2.0e-2 // Hare death rate per lynx
const d  = 4.0e-1 // Lynx death rate
const k2 = 2.0e1  // Lynx carrying capacity
const c  = 4.0e-3 // Lynx birth rate per hare

state P, Z       // Hares and lynxes
state ln_alpha   // Hare growth rate - we express it in log form for
// consistency with the inference model
obs P_obs        // Observations of hares
param mu, sigma  // Mean and standard deviation of hare growth rate
noise w          // Noise

sub parameter {
mu ~ uniform(0.0, 1.0)
sigma ~ uniform(0.0, 0.5)
}

sub proposal_parameter {
mu ~ truncated_gaussian(mu, 0.02, 0.0, 1.0);
sigma ~ truncated_gaussian(sigma, 0.01, 0.0, 0.5);
}

sub initial {
P ~ log_normal(log(100.0), 0.2)
Z ~ log_normal(log(50.0), 0.1)
ln_alpha ~ gaussian(log(mu), sigma)
}

sub transition(delta = h) {
w ~ normal(0.0, sqrt(h));
ode(h = h, atoler = delta_abs, rtoler = delta_rel, alg = 'RK4(3)') {
dP/dt =  exp(ln_alpha) * P * (1 - P / k1) - b * P * Z
dZ/dt = -d * Z * (1 + Z / k2) + c * P * Z
dln_alpha/dt = -sigma * sigma / 2 - sigma * w / h
}
}

sub observation {
P_obs ~ log_normal(log(P), 0.1)
}
}

Let’s look at the posteriors of the hyper-parameters for the Hare growth parameter.

The estimate for $\mu$ is pretty decent. For our generated data, $\sigma =0$ and given our observations are quite noisy maybe the estimate for this is not too bad also.

# Appendix: The R Driving Code

All code including the R below can be downloaded from github but make sure you use the straight-libbi branch and not master.

install.packages("devtools")
library(devtools)
install_github("sbfnk/RBi",ref="master")
install_github("sbfnk/RBi.helpers",ref="master")

rm(list = ls(all.names=TRUE))

library('RBi')
try(detach(package:RBi, unload = TRUE), silent = TRUE)
library(RBi, quietly = TRUE)

library('RBi.helpers')

library('ggplot2', quietly = TRUE)
library('gridExtra', quietly = TRUE)

endTime <- 50

PP <- bi_model("PP.bi")
synthetic_dataset_PP <- bi_generate_dataset(endtime=endTime,
model=PP,
seed="42",
verbose=TRUE,
noutputs=500))

df <- data.frame(rdata_PP$P$nr,
rdata_PP$P$value,
rdata_PP$Z$value,
rdata_PP$P_obs$value)

ggplot(df, aes(rdata_PP$P$nr, y = Population, color = variable), size = 0.1) +
geom_line(aes(y = rdata_PP$P$value, col = "Hare"), size = 0.1) +
geom_line(aes(y = rdata_PP$Z$value, col = "Lynx"), size = 0.1) +
geom_point(aes(y = rdata_PP$P_obs$value, col = "Observations"), size = 0.1) +
theme(legend.position="none") +
ggtitle("Example Data") +
xlab("Days") +
theme(axis.text=element_text(size=4),
axis.title=element_text(size=6,face="bold")) +
theme(plot.title = element_text(size=10))
ggsave(filename="diagrams/LVdata.png",width=4,height=3)

synthetic_dataset_PP1 <- bi_generate_dataset(endtime=endTime,
model=PP,
init = list(P = 100, Z=50),
seed="42",
verbose=TRUE,
noutputs=500))

synthetic_dataset_PP2 <- bi_generate_dataset(endtime=endTime,
model=PP,
init = list(P = 150, Z=25),
seed="42",
verbose=TRUE,
noutputs=500))

df1 <- data.frame(hare = rdata_PP$P$value,
lynx = rdata_PP$Z$value,
hare1 = rdata_PP1$P$value,
lynx1 = rdata_PP1$Z$value,
hare2 = rdata_PP2$P$value,
lynx2 = rdata_PP2$Z$value)

ggplot(df1) +
geom_path(aes(x=df1$hare, y=df1$lynx, col = "0"), size = 0.1) +
geom_path(aes(x=df1$hare1, y=df1$lynx1, col = "1"), size = 0.1) +
geom_path(aes(x=df1$hare2, y=df1$lynx2, col = "2"), size = 0.1) +
theme(legend.position="none") +
ggtitle("Phase Space") +
xlab("Hare") +
ylab("Lynx") +
theme(axis.text=element_text(size=4),
axis.title=element_text(size=6,face="bold")) +
theme(plot.title = element_text(size=10))
ggsave(filename="diagrams/PPviaLibBi.png",width=4,height=3)

PPInfer <- bi_model("PPInfer.bi")

bi_object_PP <- libbi(client="sample", model=PPInfer, obs = synthetic_dataset_PP)

bi_object_PP$run(add_options = list( "end-time" = endTime, noutputs = endTime, nsamples = 4000, nparticles = 128, seed=42, nthreads = 1), ## verbose = TRUE, stdoutput_file_name = tempfile(pattern="pmmhoutput", fileext=".txt")) bi_file_summary(bi_object_PP$result$output_file_name) mu <- bi_read(bi_object_PP, "mu")$value
g1 <- qplot(x = mu[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(mu))
sigma <- bi_read(bi_object_PP, "sigma")$value g2 <- qplot(x = sigma[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(sigma)) g3 <- grid.arrange(g1, g2) ggsave(plot=g3,filename="diagrams/LvPosterior.png",width=4,height=3) df2 <- data.frame(hareActs = rdata_PP$P$value, hareObs = rdata_PP$P_obs$value) ggplot(df, aes(rdata_PP$P$nr, y = value, color = variable)) + geom_line(aes(y = rdata_PP$P$value, col = "Phyto")) + geom_line(aes(y = rdata_PP$Z$value, col = "Zoo")) + geom_point(aes(y = rdata_PP$P_obs$value, col = "Phyto Obs")) ln_alpha <- bi_read(bi_object_PP, "ln_alpha")$value

P <- matrix(bi_read(bi_object_PP, "P")$value,nrow=51,byrow=TRUE) Z <- matrix(bi_read(bi_object_PP, "Z")$value,nrow=51,byrow=TRUE)

data50 <- bi_generate_dataset(endtime=endTime,
model=PP,
seed="42",
verbose=TRUE,
noutputs=50))

df3 <- data.frame(days = c(1:51), hares = rowMeans(P), lynxes = rowMeans(Z),
actHs = rdata50$P$value, actLs = rdata50$Z$value)

ggplot(df3) +
geom_line(aes(x = days, y = hares, col = "Est Phyto")) +
geom_line(aes(x = days, y = lynxes, col = "Est Zoo")) +
geom_line(aes(x = days, y = actHs, col = "Act Phyto")) +
geom_line(aes(x = days, y = actLs, col = "Act Zoo"))

# Bibliography

Andrieu, Christophe, Arnaud Doucet, and Roman Holenstein. 2010. “Particle Markov chain Monte Carlo methods.” Journal of the Royal Statistical Society. Series B: Statistical Methodology 72 (3): 269–342. doi:10.1111/j.1467-9868.2009.00736.x.

Dureau, Joseph, Konstantinos Kalogeropoulos, and Marc Baguelin. 2013. “Capturing the time-varying drivers of an epidemic using stochastic dynamical systems.” Biostatistics (Oxford, England) 14 (3): 541–55. doi:10.1093/biostatistics/kxs052.

Lotka, Alfred J. 1909. “Contribution to the Theory of Periodic Reactions.” The Journal of Physical Chemistry 14 (3): 271–74. doi:10.1021/j150111a004.

Murray, Lawrence M. n.d. “Bayesian State-Space Modelling on High-Performance Hardware Using LibBi.”

Volterra, Vito. 1926. “Variazioni e fluttuazioni del numero d’individui in specie animali conviventi.” Memorie Della R. Accademia Dei Lincei 6 (2): 31–113. http://www.liberliber.it/biblioteca/v/volterra/variazioni{\_}e{\_}fluttuazioni/pdf/volterra{\_}variazioni{\_}e{\_}fluttuazioni.pdf.

# Introduction

This is a bit different from my usual posts (well apart from my write up of hacking at Odessa) in that it is a log of how I managed to get LibBi (Library for Bayesian Inference) to run on my MacBook and then not totally satisfactorily (as you will see if you read on).

The intention is to try a few more approaches to the same problem, for example, Stan, monad-bayes and hand-crafted.

Kermack and McKendrick (1927) give a simple model of the spread of an infectious disease. Individuals move from being susceptible ($S$) to infected ($I$) to recovered ($R$).

\displaystyle \begin{aligned} \frac{dS}{dt} & = & - \delta S(t) I(t) & \\ \frac{dI}{dt} & = & \delta S(t) I(t) - & \gamma I(t) \\ \frac{dR}{dt} & = & & \gamma I(t) \end{aligned}

In 1978, anonymous authors sent a note to the British Medical Journal reporting an influenza outbreak in a boarding school in the north of England (“Influenza in a boarding school” 1978). The chart below shows the solution of the SIR (Susceptible, Infected, Record) model with parameters which give roughly the results observed in the school.

# LibBi

## Step 1

~/LibBi-stable/SIR-master $./init.sh error: 'ncread' undefined near line 6 column 7 The README says this is optional so we can skip over it. Still it would be nice to fit the bridge weight function as described in Moral and Murray (2015). The README does say that GPML is required but since we don’t (yet) need to do this step, let’s move on. ~/LibBi-stable/SIR-master$ ./run.sh
./run.sh

Error: ./configure failed with return code 77. See
.SIR/build_openmp_cuda_single/configure.log and
.SIR/build_openmp_cuda_single/config.log for details

It seems the example is configured to run on CUDA and it is highly likely that my installation of LibBI was not set up to allow this. We can change config.conf from

--disable-assert
--enable-single
--enable-cuda
--nthreads 2

to

--nthreads 4
--enable-sse
--disable-assert

On to the next issue.

~/LibBi-stable/SIR-master $./run.sh ./run.sh Error: ./configure failed with return code 1. required QRUpdate library not found. See .SIR/build_sse/configure.log and .SIR/build_sse/config.log for details But QRUpdate is installed! ~/LibBi-stable/SIR-master$ brew info QRUpdate
brew info QRUpdate
homebrew/science/qrupdate: stable 1.1.2 (bottled)
http://sourceforge.net/projects/qrupdate/
/usr/local/Cellar/qrupdate/1.1.2 (3 files, 302.6K)
/usr/local/Cellar/qrupdate/1.1.2_2 (6 files, 336.3K)
Poured from bottle
/usr/local/Cellar/qrupdate/1.1.2_3 (6 files, 337.3K) *
Poured from bottle
From: https://github.com/Homebrew/homebrew-science/blob/master/qrupdate.rb
==> Dependencies
Required: veclibfort ✔
Optional: openblas ✔
==> Options
--with-openblas
Build with openblas support
--without-check
Skip build-time tests (not recommended)

Let’s look in the log as advised. So it seems that a certain symbol cannot be found.

checking for dch1dn_ in -lqrupdate

Let’s try ourselves.

nm -g /usr/local/Cellar/qrupdate/1.1.2_3/lib/libqrupdate.a | grep dch1dn_
0000000000000000 T _dch1dn_

So the symbol is there! What gives? Let’s try setting one of the environment variables.

export LDFLAGS='-L/usr/local/lib'

Now we get further.

./run.sh
Error: ./configure failed with return code 1. required NetCDF header
.SIR/build_sse/config.log for details

So we just need to set another environment variable.

export CPPFLAGS='-I/usr/local/include/'

This is more mysterious.

./run.sh
Error: ./configure failed with return code 1. required Boost header
.SIR/build_sse/config.log for details ~/LibBi-stable/SIR-master

Let’s see what we have.

brew list | grep -i boost

Nothing! I recall having some problems with boost when trying to use a completely different package. So let’s install boost.

brew install boost

Now we get a different error.

./run.sh
Error: make failed with return code 2, see .SIR/build_sse/make.log for details

Fortunately at some time in the past sbfnk took pity on me and advised me here to use boost155, a step that should not be lightly undertaken.

/usr/local/Cellar/boost155/1.55.0_1: 10,036 files, 451.6M, built in 15 minutes 9 seconds

Even then I had to say

brew link --force boost155

Finally it runs.

./run.sh 2> out.txt

And produces a lot of output

wc -l out.txt
49999 out.txt

ls -ltrh results/posterior.nc
1.7G Apr 30 19:57 results/posterior.nc

Rather worringly, out.txt has all lines of the form

1: -51.9191 -23.2045 nan beats -inf -inf -inf accept=0.5

nan beating -inf does not sound good.

Now we are in a position to analyse the results.

octave --path oct/ --eval "plot_and_print"
error: 'bi_plot_quantiles' undefined near line 23 column 5

I previously found an Octave package(?) called OctBi so let’s create an .octaverc file which adds this to the path. We’ll also need to load the netcdf package which we previously installed.

addpath ("../OctBi-stable/inst")
pkg load netcdf
~/LibBi-stable/SIR-master octave --path oct/ --eval "plot_and_print" octave --path oct/ --eval "plot_and_print" warning: division by zero warning: called from mean at line 117 column 7 read_hist_simulator at line 47 column 11 bi_read_hist at line 85 column 12 bi_hist at line 63 column 12 plot_and_print at line 56 column 5 warning: division by zero warning: division by zero warning: division by zero warning: division by zero warning: division by zero warning: print.m: fig2dev binary is not available. Some output formats are not available. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. sh: pdfcrop: command not found I actually get a chart from this so some kind of success. This does not look like the chart in the Moral and Murray (2015), the fitted number of infected patients looks a lot smoother and the “rates” parameters also vary in a much smoother manner. For reasons I haven’t yet investigated, it looks like over-fitting. Here’s the charts in the paper. # Bibliography “Influenza in a boarding school.” 1978. British Medical Journal, March, 587. Kermack, W. O., and A. G. McKendrick. 1927. “A Contribution to the Mathematical Theory of Epidemics.” Proceedings of the Royal Society of London Series A 115 (August): 700–721. doi:10.1098/rspa.1927.0118. Moral, Pierre Del, and Lawrence M Murray. 2015. “Sequential Monte Carlo with Highly Informative Observations.” # Every Manifold is Paracompact # Introduction In their paper Betancourt et al. (2014), the authors give a corollary which starts with the phrase “Because the manifold is paracompact”. It wasn’t immediately clear why the manifold was paracompact or indeed what paracompactness meant although it was clearly something like compactness which means that every cover has a finite sub-cover. It turns out that every manifold is paracompact and that this is intimately related to partitions of unity. Most of what I have written below is taken from some hand-written anonymous lecture notes I found by chance in the DPMMS library in Cambridge University. To whomever wrote them: thank you very much. # Limbering Up Let $\{U_i : i \in {\mathcal{I}}\}$ be an open cover of a smooth manifold $M$. A partition of unity on M, subordinate to the cover $\{U_i : i \in {\mathcal{I}}\}$ is a finite collection of smooth functions $\displaystyle X_j : M^n \longrightarrow \mathbb{R}_+$ where $j = 1, 2, \ldots N$ for some $N$ such that $\displaystyle \sum_{j = 0}^N X_j(x) = 1 \,\mathrm{for all}\, x \in M$ and for each $j$ there exists $i(j) \in {\mathcal{I}}$ such that $\displaystyle {\mathrm{supp}}{X_j} \subset U_{i(j)}$ We don’t yet know partitions of unity exist. First define $\displaystyle f(t) \triangleq \begin{cases} 0 & \text{if } t \leq 0 \\ \exp{(-1/t)} & \text{if } t > 0 \\ \end{cases}$ Techniques of classical analysis easily show that $f$ is smooth ($t=0$ is the only point that might be in doubt and it can be checked from first principles that $f^{(n)}(0) = 0$ for all $n$). Next define \displaystyle \begin{aligned} g(t) &\triangleq \frac{f(t)}{f(t) + f(1 - t)} \\ h(t) &\triangleq g(t + 2)g(2 - t) \end{aligned} Finally we can define $F: \mathbb{R}^n \rightarrow \mathbb{R}$ by $F(x) = h(\|x\|)$. This has the properties • $F(x) = 1$ if $\|x\| \leq 1$ • $0 \leq F(x) \leq 1$ • $F(x) = 0$ if $\|x\| > 2$ Now take a point $p \in M$ centred in a chart $(U_p, \phi_p)$ so that, without loss of generality, $B(0,3) \subseteq \phi_p(U_p)$ (we can always choose $r_p$ so that the open ball $B(0,3r_p) \subseteq \phi'_p(U_p)$ and then define another chart $(U_p, \phi_p)$ with $\phi_p(x) = \phi'_p(x)/\|x\|$). Define the images of the open and closed balls of radius $1$ and $2$ respectively \displaystyle \begin{aligned} V_p &= \phi_p^{-1}(B(0,1)) \\ W_p &= \phi_p^{-1}\big(\overline{B(0,2)}\big) \\ \end{aligned} and further define bump functions $\displaystyle \psi_p(y) \triangleq \begin{cases} F(\phi_p(y)) & \text{if } y \in U_p\\ 0 & \text{otherwise} \\ \end{cases}$ Then $\psi_p$ is smooth and its support lies in $W_p \subset U_p$. By compactness, the open cover $\{V_p : p \in M\}$ has a finite subcover $\{V_{p_1},\ldots,V_{p_K}\}$. Now define $\displaystyle X_j : M^n \longrightarrow \mathbb{R}_+$ by $\displaystyle X_j(y) = \frac{\psi_{p_j}(y)}{\sum_{i=1}^K \psi_{p_i}(y)}$ Then $X_j$ is smooth, ${\mathrm{supp}}{X_j} = {\mathrm{supp}}{\psi_{p_j}} \subset U_{p_j}$ and $\sum_{j=1}^K X_j(y) = 1$. Thus $\{X_j\}$ is the required partition of unity. # Paracompactness Because $M$ is a manifold, it has a countable basis $\{A_i\}_{i \in \mathbb{N}}$ and for any point $p$, there must exist $A_i \subset V_p$ with $p \in A_i$. Choose one of these and call it $V_{p_i}$. This gives a countable cover of $M$ by such sets. Now define $\displaystyle L_1 = W_{p_1} \subset V_{p_1} \cup V_{p_2} \cup \ldots \cup V_{p_{i(2)}}$ where, since $L_1$ is compact, $V_{p_2}, \ldots, V_{p_{i(2)}}$ is a finite subcover. And further define $\displaystyle L_n = W_{p_1} \cup W_{p_2} \cup \ldots \cup W_{p_{i(n)}} \subset V_{p_1} \cup V_{p_2} \cup \ldots \cup V_{p_{i(n+1)}}$ where again, since $L_n$ is compact, $V_{p_{i(n)+1}}, \ldots, V_{p_{i(n+1)}}$ is a finite subcover. Now define \displaystyle \begin{aligned} K_n &= L_n \setminus {\mathrm{int}}{L_{n-1}} \\ U_n &= {\mathrm{int}}(L_{n+1}) \setminus L_n \end{aligned} Then $K_n$ is compact, $U_n$ is open and $K_n \subset U_n$. Furthermore, $\bigcup_{n \in \mathbb{N}} K_n = M$ and $U_n$ only intersects with $U_{n-1}$ and $U_{n+1}$. Given any open cover ${\mathcal{O}}$ of $M$, each $K_n$ can be covered by a finite number of open sets in $U_n$ contained in some member of ${\mathcal{O}}$. Thus every point in $K_n$ can be covered by at most a finite number of sets from $U_{n-1}, U_n$ and $U_{n+1}$ and which are contained in some member of ${\mathcal{O}}$. This is a locally finite refinement of ${\mathcal{O}}$ and which is precisely the definition of paracompactness. To produce a partition of unity we define bump functions $\psi_j$ as above on this locally finite cover and note that locally finite implies that $\sum_j \psi_j$ is well defined. Again, as above, define $\displaystyle X_j(y) = \frac{\psi_{j}(y)}{\sum_{i=1} \psi_{i}(y)}$ to get the required result. # Bibliography Betancourt, M. J., Simon Byrne, Samuel Livingstone, and Mark Girolami. 2014. “The Geometric Foundations of Hamiltonian Monte Carlo,” October, 45. http://arxiv.org/abs/1410.5110. # Particle Smoothing # Introduction The equation of motion for a pendulum of unit length subject to Gaussian white noise is $\displaystyle \frac{\mathrm{d}^2\alpha}{\mathrm{d}t^2} = -g\sin\alpha + w(t)$ We can discretize this via the usual Euler method $\displaystyle \begin{bmatrix} x_{1,i} \\ x_{2,i} \end{bmatrix} = \begin{bmatrix} x_{1,i-1} + x_{2,i-1}\Delta t \\ x_{2,i-1} - g\sin x_{1,i-1}\Delta t \end{bmatrix} + \mathbf{q}_{i-1}$ where $q_i \sim {\mathcal{N}}(0,Q)$ and $\displaystyle Q = \begin{bmatrix} \frac{q^c \Delta t^3}{3} & \frac{q^c \Delta t^2}{2} \\ \frac{q^c \Delta t^2}{2} & {q^c \Delta t} \end{bmatrix}$ The explanation of the precise form of the covariance matrix will be the subject of another blog post; for the purpose of exposition of forward filtering / backward smoothing, this detail is not important. Assume that we can only measure the horizontal position of the pendulum and further that this measurement is subject to error so that $\displaystyle y_i = \sin x_i + r_k$ where $r_i \sim {\mathcal{N}}(0,R)$. Particle Filtering can give us an estimate of where the pendulum is and its velocity using all the observations up to that point in time. But now suppose we have observed the pendulum for a fixed period of time then at times earlier than the time at which we stop our observations we now have observations in the future as well as in the past. If we can somehow take account of these future observations then we should be able to improve our estimate of where the pendulum was at any given point in time (as well as its velocity). Forward Filtering / Backward Smoothing is a technique for doing this. ## Haskell Preamble > {-# OPTIONS_GHC -Wall #-} > {-# OPTIONS_GHC -fno-warn-name-shadowing #-} > {-# OPTIONS_GHC -fno-warn-type-defaults #-} > {-# OPTIONS_GHC -fno-warn-unused-do-bind #-} > {-# OPTIONS_GHC -fno-warn-missing-methods #-} > {-# OPTIONS_GHC -fno-warn-orphans #-}  > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE ScopedTypeVariables #-} > {-# LANGUAGE ExplicitForAll #-} > {-# LANGUAGE DataKinds #-}  > {-# LANGUAGE FlexibleInstances #-} > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE FlexibleContexts #-} > {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE BangPatterns #-} > {-# LANGUAGE GeneralizedNewtypeDeriving #-} > {-# LANGUAGE TemplateHaskell #-} > {-# LANGUAGE DataKinds #-} > {-# LANGUAGE DeriveGeneric #-}  > module PendulumSamples ( pendulumSamples > , pendulumSamples' > , testFiltering > , testSmoothing > , testFilteringG > , testSmoothingG > ) where  > import Data.Random hiding ( StdNormal, Normal ) > import Data.Random.Source.PureMT ( pureMT ) > import Control.Monad.State ( evalState, replicateM ) > import qualified Control.Monad.Loops as ML > import Control.Monad.Writer ( tell, WriterT, lift, > runWriterT > ) > import Numeric.LinearAlgebra.Static > ( R, vector, Sym, > headTail, matrix, sym, > diag > ) > import GHC.TypeLits ( KnownNat ) > import MultivariateNormal ( MultivariateNormal(..) ) > import qualified Data.Vector as V > import Data.Bits ( shiftR ) > import Data.List ( transpose ) > import Control.Parallel.Strategies > import GHC.Generics (Generic)  # Simulation Let’s first plot some typical trajectories of the pendulum. > deltaT, g :: Double > deltaT = 0.01 > g = 9.81  > type PendulumState = R 2 > type PendulumObs = R 1  > pendulumSample :: MonadRandom m => > Sym 2 -> > Sym 1 -> > PendulumState -> > m (Maybe ((PendulumState, PendulumObs), PendulumState)) > pendulumSample bigQ bigR xPrev = do > let x1Prev = fst headTail xPrev
>       x2Prev = fst $headTail$ snd $headTail xPrev > eta <- sample$ rvar (MultivariateNormal 0.0 bigQ)
>   let x1= x1Prev + x2Prev * deltaT
>       x2 = x2Prev - g * (sin x1Prev) * deltaT
>       xNew = vector [x1, x2] + eta
>       x1New = fst $headTail xNew > epsilon <- sample$ rvar (MultivariateNormal 0.0 bigR)
>   let yNew = vector [sin x1New] + epsilon
>   return $Just ((xNew, yNew), xNew)  Let’s try plotting some samples when we are in the linear region with which we are familiar from school $\sin\alpha \approx \alpha$. $\displaystyle \frac{\mathrm{d}^2\alpha}{\mathrm{d}t^2} = -g\alpha + w(t)$ In this case we expect the horizontal displacement to be approximately equal to the angle of displacement and thus the observations to be symmetric about the actuals. > bigQ :: Sym 2 > bigQ = sym$ matrix bigQl

> qc1 :: Double
> qc1 = 0.0001

> bigQl :: [Double]
> bigQl = [ qc1 * deltaT^3 / 3, qc1 * deltaT^2 / 2,
>           qc1 * deltaT^2 / 2,       qc1 * deltaT
>         ]

> bigR :: Sym 1
> bigR  = sym $matrix [0.0001]  > m0 :: PendulumState > m0 = vector [0.01, 0]  > pendulumSamples :: [(PendulumState, PendulumObs)] > pendulumSamples = evalState (ML.unfoldrM (pendulumSample bigQ bigR) m0) (pureMT 17)  But if we work in a region in which linearity breaks down then the observations are no longer symmetrical about the actuals. > bigQ' :: Sym 2 > bigQ' = sym$ matrix bigQl'

> qc1' :: Double
> qc1' = 0.01

> bigQl' :: [Double]
> bigQl' = [ qc1' * deltaT^3 / 3, qc1' * deltaT^2 / 2,
>            qc1' * deltaT^2 / 2,       qc1' * deltaT
>          ]

> bigR' :: Sym 1
> bigR'  = sym $matrix [0.1]  > m0' :: PendulumState > m0' = vector [1.6, 0]  > pendulumSamples' :: [(PendulumState, PendulumObs)] > pendulumSamples' = evalState (ML.unfoldrM (pendulumSample bigQ' bigR') m0') (pureMT 17)  # Filtering We do not give the theory behind particle filtering. The interested reader can either consult Särkkä (2013) or wait for a future blog post on the subject. > nParticles :: Int > nParticles = 500  The usual Bayesian update step. > type Particles a = V.Vector a  > oneFilteringStep :: > MonadRandom m => > (Particles a -> m (Particles a)) -> > (Particles a -> Particles b) -> > (b -> b -> Double) -> > Particles a -> > b -> > WriterT [Particles a] m (Particles a) > oneFilteringStep stateUpdate obsUpdate weight statePrevs obs = do > tell [statePrevs] > stateNews <- lift$ stateUpdate statePrevs
>   let obsNews = obsUpdate stateNews
>   let weights       = V.map (weight obs) obsNews
>       cumSumWeights = V.tail $V.scanl (+) 0 weights > totWeight = V.last cumSumWeights > vs <- lift$ V.replicateM nParticles (sample $uniform 0.0 totWeight) > let js = indices cumSumWeights vs > stateTildes = V.map (stateNews V.!) js > return stateTildes  The system state and observable. > data SystemState a = SystemState { x1 :: a, x2 :: a } > deriving (Show, Generic)  > instance NFData a => NFData (SystemState a)  > newtype SystemObs a = SystemObs { y1 :: a } > deriving Show  To make the system state update a bit more readable, let’s introduce some lifted arithmetic operators. > (.+), (.*), (.-) :: (Num a) => V.Vector a -> V.Vector a -> V.Vector a > (.+) = V.zipWith (+) > (.*) = V.zipWith (*) > (.-) = V.zipWith (-)  The state update itself > stateUpdate :: Particles (SystemState Double) -> > Particles (SystemState Double) > stateUpdate xPrevs = V.zipWith SystemState x1s x2s > where > ix = V.length xPrevs > > x1Prevs = V.map x1 xPrevs > x2Prevs = V.map x2 xPrevs > > deltaTs = V.replicate ix deltaT > gs = V.replicate ix g > x1s = x1Prevs .+ (x2Prevs .* deltaTs) > x2s = x2Prevs .- (gs .* (V.map sin x1Prevs) .* deltaTs)  and its noisy version. > stateUpdateNoisy :: MonadRandom m => > Sym 2 -> > Particles (SystemState Double) -> > m (Particles (SystemState Double)) > stateUpdateNoisy bigQ xPrevs = do > let xs = stateUpdate xPrevs > > x1s = V.map x1 xs > x2s = V.map x2 xs > > let ix = V.length xPrevs > etas <- replicateM ix$ sample $rvar (MultivariateNormal 0.0 bigQ) > > let eta1s, eta2s :: V.Vector Double > eta1s = V.fromList$ map (fst . headTail) etas
>       eta2s = V.fromList $map (fst . headTail . snd . headTail) etas > > return (V.zipWith SystemState (x1s .+ eta1s) (x2s .+ eta2s))  The function which maps the state to the observable. > obsUpdate :: Particles (SystemState Double) -> > Particles (SystemObs Double) > obsUpdate xs = V.map (SystemObs . sin . x1) xs  And finally a function to calculate the weight of each particle given an observation. > weight :: forall a n . KnownNat n => > (a -> R n) -> > Sym n -> > a -> a -> Double > weight f bigR obs obsNew = pdf (MultivariateNormal (f obsNew) bigR) (f obs)  The variance of the prior. > bigP :: Sym 2 > bigP = sym$ diag 0.1


Generate our ensemble of particles chosen from the prior,

> initParticles :: MonadRandom m =>
>                  m (Particles (SystemState Double))
> initParticles = V.replicateM nParticles $do > r <- sample$ rvar (MultivariateNormal m0' bigP)
>   let x1 = fst $headTail r > x2 = fst$ headTail $snd$ headTail r
>   return $SystemState { x1 = x1, x2 = x2}  run the particle filter, > runFilter :: Int -> [Particles (SystemState Double)] > runFilter nTimeSteps = snd$ evalState action (pureMT 19)
>   where
>     action = runWriterT $do > xs <- lift$ initParticles
>       V.foldM
>         (oneFilteringStep (stateUpdateNoisy bigQ') obsUpdate (weight f bigR'))
>         xs
>         (V.fromList map (SystemObs . fst . headTail . snd) > (take nTimeSteps pendulumSamples'))  and extract the estimated position from the filter. > testFiltering :: Int -> [Double] > testFiltering nTimeSteps = map ((/ (fromIntegral nParticles)). sum . V.map x1) > (runFilter nTimeSteps)  # Smoothing If we could calculate the marginal smoothing distributions $\{p(x_t \,|\, y_{1:T})\}_{i=1}^T$ then we might be able to sample from them. Using the Markov assumption of our model that $x_i$ is independent of $y_{i+1:N}$ given $x_{i+1}$, we have \displaystyle \begin{aligned} \overbrace{p(x_i \,|\, y_{1:N})}^{\mathrm{smoother}\,\mathrm{at}\, i} &= \int p(x_i, x_{i+1} \,|\, y_{1:N}) \,\mathrm{d}x_{i+1} & \text{marginal distribution} \\ &= \int p(x_{i+1} \,|\, y_{1:N}) \,p(x_{i} \,|\, y_{1:N}, x_{i+1}) \,\mathrm{d}x_{i+1} & \text{conditional density} \\ &= \int p(x_{i+1} \,|\, y_{1:N}) \,p(x_{i} \,|\, y_{1:i}, x_{i+1}) \,\mathrm{d}x_{i+1} & \text{Markov model} \\ &= \int p(x_{i+1} \,|\, y_{1:N}) \, \frac{p(x_{i}, x_{i+1} \,|\, y_{1:i})}{p(x_{i+1} \,|\, y_{1:i})} \,\mathrm{d}x_{i+1} & \text{conditional density} \\ &= \int p(x_{i+1} \,|\, y_{1:N}) \, \frac{p(x_{i+1} \,|\, x_{i}, y_{1:i}) \,p(x_{i} \,|\, y_{1:i})}{p(x_{i+1} \,|\, y_{1:i})} \,\mathrm{d}x_{i+1} & \text{conditional density} \\ &= \int \overbrace{p(x_{i+1} \,|\, y_{1:N})}^{\text{smoother at }i+1} \, \underbrace{ \overbrace{p(x_{i} \,|\, y_{1:i})}^{\text{filter at }i} \frac{p(x_{i+1} \,|\, x_{i})} {p(x_{i+1} \,|\, y_{1:i})} } _{\text{backward transition }p(x_{i} \,|\, y_{1:i},\,x_{i+1})} \,\mathrm{d}x_{i+1} & \text{Markov model} \end{aligned} We observe that this is a (continuous state space) Markov process with a non-homogeneous transition function albeit one which goes backwards in time. Apparently for conditionally Gaussian linear state-space models, this is known as RTS, or Rauch-Tung-Striebel smoothing (Rauch, Striebel, and Tung (1965)). According to Cappé (2008), • It appears to be efficient and stable in the long term (although no proof was available at the time the slides were presented). • It is not sequential (in particular, one needs to store all particle positions and weights). • It has numerical complexity proportional $O(n^2)$ where $N$ is the number of particles. We can use this to sample paths starting at time $i = N$ and working backwards. From above we have $\displaystyle p(x_i \,|\, X_{i+1}, Y_{1:N}) = \frac{p(X_{i+1} \,|\, x_{i}) \,p(x_{i} \,|\, Y_{1:i})}{p(X_{i+1} \,|\, Y_{1:i})} = Z \,p(X_{i+1} \,|\, x_{i}) \,p(x_{i} \,|\, Y_{1:i})$ where $Z$ is some normalisation constant (Z for “Zustandssumme” – sum over states). From particle filtering we know that $\displaystyle {p}(x_i \,|\, y_{1:i}) \approx \hat{p}(x_i \,|\, y_{1:i}) \triangleq \sum_{j=1}^M w_i^{(j)}\delta(x_i - x_i^{(j)})$ Thus $\displaystyle \hat{p}(x_i \,|\, X_{i+1}, Y_{1:i}) = Z \,p(X_{i+1} \,|\, x_{i}) \,\hat{p}(x_{i} \,|\, Y_{1:i}) = \sum_{j=1}^M w_i^{(j)}\delta(x_i - x_i^{(j)}) \,p(X_{i+1} \,|\, x_{i})$ and we can sample $x_i$ from $\{x_i^{(j)}\}_{1 \leq j \leq M}$ with probability $\displaystyle \frac{w_k^{(i)} \,p(X_{i+1} \,|\, x_{i})} {\sum_{i=1}^N w_k^{(i)} \,p(X_{i+1} \,|\, x_{i})}$ Recalling that we have re-sampled the particles in the filtering algorithm so that their weights are all $1/M$ and abstracting the state update and state density function, we can encode this update step in Haskell as > oneSmoothingStep :: MonadRandom m => > (Particles a -> V.Vector a) -> > (a -> a -> Double) -> > a -> > Particles a -> > WriterT (Particles a) m a > oneSmoothingStep stateUpdate > stateDensity > smoothingSampleAtiPlus1 > filterSamplesAti = do it > where > it = do > let mus = stateUpdate filterSamplesAti > weights = V.map (stateDensity smoothingSampleAtiPlus1) mus > cumSumWeights = V.tail V.scanl (+) 0 weights
>           totWeight     = V.last cumSumWeights
>       v <- lift $sample$ uniform 0.0 totWeight
>       let ix = binarySearch cumSumWeights v
>           xnNew = filterSamplesAti V.! ix
>       tell $V.singleton xnNew > return$ xnNew


To sample a complete path we start with a sample from the filtering distribution at at time $i = N$ (which is also the smoothing distribution)

> oneSmoothingPath :: MonadRandom m =>
>              (Int -> V.Vector (Particles a)) ->
>              (a -> Particles a -> WriterT (Particles a) m a) ->
>              Int -> m (a, V.Vector a)
> oneSmoothingPath filterEstss oneSmoothingStep nTimeSteps = do
>   let ys = filterEstss nTimeSteps
>   ix <- sample $uniform 0 (nParticles - 1) > let xn = (V.head ys) V.! ix > runWriterT$ V.foldM oneSmoothingStep xn (V.tail ys)

> oneSmoothingPath' :: (MonadRandom m, Show a) =>
>              (Int -> V.Vector (Particles a)) ->
>              (a -> Particles a -> WriterT (Particles a) m a) ->
>              Int -> WriterT (Particles a) m a
> oneSmoothingPath' filterEstss oneSmoothingStep nTimeSteps = do
>   let ys = filterEstss nTimeSteps
>   ix <- lift $sample$ uniform 0 (nParticles - 1)
>   let xn = (V.head ys) V.! ix
>   V.foldM oneSmoothingStep xn (V.tail ys)


Of course we need to run through the filtering distributions starting at $i = N$

> filterEstss :: Int -> V.Vector (Particles (SystemState Double))
> filterEstss n = V.reverse $V.fromList$ runFilter n

> testSmoothing :: Int -> Int -> [Double]
> testSmoothing m n = V.toList $evalState action (pureMT 23) > where > action = do > xss <- V.replicateM n$
>              snd <$> (oneSmoothingPath filterEstss (oneSmoothingStep stateUpdate (weight h bigQ')) m) > let yss = V.fromList$ map V.fromList $> transpose$
>                 V.toList $V.map (V.toList)$
>                 xss
>       return $V.map (/ (fromIntegral n))$ V.map V.sum $V.map (V.map x1) yss  By eye we can see we get a better fit and calculating the mean square error for filtering gives $1.87\times10^{-2}$ against the mean square error for smoothing of $9.52\times10^{-3}$; this confirms the fit really is better as one would hope. # Unknown Gravity Let us continue with the same example but now assume that $g$ is unknown and that we wish to estimate it. Let us also assume that our apparatus is not subject to noise. Again we have $\displaystyle \frac{\mathrm{d}^2\alpha}{\mathrm{d}t^2} = -g\sin\alpha + w(t)$ But now when we discretize it we include a third variable $\displaystyle \begin{bmatrix} x_{1,i} \\ x_{2,i} \\ x_{3,i} \end{bmatrix} = \begin{bmatrix} x_{1,i-1} + x_{2,i-1}\Delta t \\ x_{2,i-1} - x_{3,i-1}\sin x_{1,i-1}\Delta t \\ x_{3,i-1} \end{bmatrix} + \mathbf{q}_{i-1}$ where $q_i \sim {\mathcal{N}}(0,Q)$ $\displaystyle Q = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & \sigma^2_g \end{bmatrix}$ Again we assume that we can only measure the horizontal position of the pendulum so that $\displaystyle y_i = \sin x_i + r_k$ where $r_i \sim {\mathcal{N}}(0,R)$. > type PendulumStateG = R 3  > pendulumSampleG :: MonadRandom m => > Sym 3 -> > Sym 1 -> > PendulumStateG -> > m (Maybe ((PendulumStateG, PendulumObs), PendulumStateG)) > pendulumSampleG bigQ bigR xPrev = do > let x1Prev = fst$ headTail xPrev
>       x2Prev = fst $headTail$ snd $headTail xPrev > x3Prev = fst$ headTail $snd$ headTail $snd$ headTail xPrev
>   eta <- sample $rvar (MultivariateNormal 0.0 bigQ) > let x1= x1Prev + x2Prev * deltaT > x2 = x2Prev - g * (sin x1Prev) * deltaT > x3 = x3Prev > xNew = vector [x1, x2, x3] + eta > x1New = fst$ headTail xNew
>   epsilon <-  sample $rvar (MultivariateNormal 0.0 bigR) > let yNew = vector [sin x1New] + epsilon > return$ Just ((xNew, yNew), xNew)

> pendulumSampleGs :: [(PendulumStateG, PendulumObs)]
> pendulumSampleGs = evalState (ML.unfoldrM (pendulumSampleG bigQg bigRg) mG) (pureMT 29)

> data SystemStateG a = SystemStateG { gx1  :: a, gx2  :: a, gx3 :: a }
>   deriving Show


The state update itself

> stateUpdateG :: Particles (SystemStateG Double) ->
>                 Particles (SystemStateG Double)
> stateUpdateG xPrevs = V.zipWith3 SystemStateG x1s x2s x3s
>   where
>     ix = V.length xPrevs
>
>     x1Prevs = V.map gx1 xPrevs
>     x2Prevs = V.map gx2 xPrevs
>     x3Prevs = V.map gx3 xPrevs
>
>     deltaTs = V.replicate ix deltaT
>     x1s = x1Prevs .+ (x2Prevs .* deltaTs)
>     x2s = x2Prevs .- (x3Prevs .* (V.map sin x1Prevs) .* deltaTs)
>     x3s = x3Prevs


and its noisy version.

> stateUpdateNoisyG :: MonadRandom m =>
>                      Sym 3 ->
>                      Particles (SystemStateG Double) ->
>                      m (Particles (SystemStateG Double))
> stateUpdateNoisyG bigQ xPrevs = do
>   let ix = V.length xPrevs
>
>   let xs = stateUpdateG xPrevs
>
>       x1s = V.map gx1 xs
>       x2s = V.map gx2 xs
>       x3s = V.map gx3 xs
>
>   etas <- replicateM ix $sample$ rvar (MultivariateNormal 0.0 bigQ)
>   let eta1s, eta2s, eta3s :: V.Vector Double
>       eta1s = V.fromList $map (fst . headTail) etas > eta2s = V.fromList$ map (fst . headTail . snd . headTail) etas
>       eta3s = V.fromList $map (fst . headTail . snd . headTail . snd . headTail) etas > > return (V.zipWith3 SystemStateG (x1s .+ eta1s) (x2s .+ eta2s) (x3s .+ eta3s))  The function which maps the state to the observable. > obsUpdateG :: Particles (SystemStateG Double) -> > Particles (SystemObs Double) > obsUpdateG xs = V.map (SystemObs . sin . gx1) xs  The mean and variance of the prior. > mG :: R 3 > mG = vector [1.6, 0.0, 8.00]  > bigPg :: Sym 3 > bigPg = sym$ matrix [
>     1e-6, 0.0, 0.0
>   , 0.0, 1e-6, 0.0
>   , 0.0, 0.0, 1e-2
>   ]


Parameters for the state update; note that the variance is not exactly the same as in the formulation above.

> bigQg :: Sym 3
> bigQg = sym $matrix bigQgl  > qc1G :: Double > qc1G = 0.0001  > sigmaG :: Double > sigmaG = 1.0e-2  > bigQgl :: [Double] > bigQgl = [ qc1G * deltaT^3 / 3, qc1G * deltaT^2 / 2, 0.0, > qc1G * deltaT^2 / 2, qc1G * deltaT, 0.0, > 0.0, 0.0, sigmaG > ]  The noise of the measurement. > bigRg :: Sym 1 > bigRg = sym$ matrix [0.1]


Generate the ensemble of particles from the prior,

> initParticlesG :: MonadRandom m =>
>                  m (Particles (SystemStateG Double))
> initParticlesG = V.replicateM nParticles $do > r <- sample$ rvar (MultivariateNormal mG bigPg)
>   let x1 = fst $headTail r > x2 = fst$ headTail $snd$ headTail r
>       x3 = fst $headTail$ snd $headTail$ snd $headTail r > return$ SystemStateG { gx1 = x1, gx2 = x2, gx3 = x3}


run the particle filter,

> runFilterG :: Int -> [Particles (SystemStateG Double)]
> runFilterG n = snd $evalState action (pureMT 19) > where > action = runWriterT$ do
>       xs <- lift $initParticlesG > V.foldM > (oneFilteringStep (stateUpdateNoisyG bigQg) obsUpdateG (weight f bigRg)) > xs > (V.fromList$ map (SystemObs . fst . headTail . snd) (take n pendulumSampleGs))


and extract the estimated parameter from the filter.

> testFilteringG :: Int -> [Double]
> testFilteringG n = map ((/ (fromIntegral nParticles)). sum . V.map gx3) (runFilterG n)


Again we need to run through the filtering distributions starting at $i = N$

> filterGEstss :: Int -> V.Vector (Particles (SystemStateG Double))
> filterGEstss n = V.reverse $V.fromList$ runFilterG n

> testSmoothingG :: Int -> Int -> ([Double], [Double], [Double])
> testSmoothingG m n = (\(x, y, z) -> (V.toList x, V.toList y, V.toList z))  $> mkMeans$
>                      chunks
>   where
>
>     chunks =
>       V.fromList $map V.fromList$
>       transpose $> V.toList$ V.map (V.toList) $> chunksOf m$
>       snd $evalState (runWriterT action) (pureMT 23) > > mkMeans yss = ( > V.map (/ (fromIntegral n))$ V.map V.sum $V.map (V.map gx1) yss, > V.map (/ (fromIntegral n))$ V.map V.sum $V.map (V.map gx2) yss, > V.map (/ (fromIntegral n))$ V.map V.sum $V.map (V.map gx3) yss > ) > > action = > V.replicateM n$
>       oneSmoothingPath' filterGEstss
>                         (oneSmoothingStep stateUpdateG (weight hG bigQg))
>                         m


Again by eye we get a better fit but note that we are using the samples in which the state update is noisy as well as the observation so we don’t expect to get a very good fit.

# Notes

## Helpers for Converting Types

> f :: SystemObs Double -> R 1
> f = vector . pure . y1

> h :: SystemState Double -> R 2
> h u = vector [x1 u , x2 u]

> hG :: SystemStateG Double -> R 3
> hG u = vector [gx1 u , gx2 u, gx3 u]


## Helpers for the Inverse CDF

That these are helpers for the inverse CDF is delayed to another blog post.

> indices :: V.Vector Double -> V.Vector Double -> V.Vector Int
> indices bs xs = V.map (binarySearch bs) xs

> binarySearch :: (Ord a) =>
>                 V.Vector a -> a -> Int
> binarySearch vec x = loop 0 (V.length vec - 1)
>   where
>     loop !l !u
>       | u <= l    = l
>       | otherwise = let e = vec V.! k in if x <= e then loop l k else loop (k+1) u
>       where k = l + (u - l) shiftR 1


## Vector Helpers

> chunksOf :: Int -> V.Vector a -> V.Vector (V.Vector a)
> chunksOf n xs = ys
>   where
>     l = V.length xs
>     m  = 1 + (l - 1) div n
>     ys = V.unfoldrN m (\us -> Just (V.take n us, V.drop n us)) xs


# Bibliography

Cappé, Olivier. 2008. “An Introduction to Sequential Monte Carlo for Filtering and Smoothing.” http://www-irma.u-strasbg.fr/~guillou/meeting/cappe.pdf.

Rauch, H. E., C. T. Striebel, and F. Tung. 1965. “Maximum Likelihood Estimates of Linear Dynamic Systems.” Journal of the American Institute of Aeronautics and Astronautics 3 (8): 1445–50.

Särkkä, Simo. 2013. Bayesian Filtering and Smoothing. New York, NY, USA: Cambridge University Press.

# Introduction

The equation of motion for a pendulum of unit length subject to Gaussian white noise is

$\displaystyle \frac{\mathrm{d}^2\alpha}{\mathrm{d}t^2} = -g\sin\alpha + w(t)$

We can discretize this via the usual Euler method

$\displaystyle \begin{bmatrix} x_{1,i} \\ x_{2,i} \end{bmatrix} = \begin{bmatrix} x_{1,i-1} + x_{2,i-1}\Delta t \\ x_{2,i-1} - g\sin x_{1,i-1}\Delta t \end{bmatrix} + \mathbf{q}_{i-1}$

where $q_i \sim {\mathcal{N}}(0,Q)$ and

$\displaystyle Q = \begin{bmatrix} \frac{q^c \Delta t^3}{3} & \frac{q^c \Delta t^2}{2} \\ \frac{q^c \Delta t^2}{2} & {q^c \Delta t} \end{bmatrix}$

The explanation of the precise form of the covariance matrix will be the subject of another blog post; for the purpose of exposition of using Stan and, in particular, Stan’s ability to handle ODEs, this detail is not important.

Instead of assuming that we know $g$ let us take it to be unknown and that we wish to infer its value using the pendulum as our experimental apparatus.

Stan is a probabilistic programming language which should be welll suited to perform such an inference. We use its interface via the R package rstan.

# A Stan and R Implementation

Let’s generate some samples using Stan but rather than generating exactly the model we have given above, instead let’s solve the differential equation and then add some noise. Of course this won’t quite give us samples from the model the parameters of which we wish to estimate but it will allow us to use Stan’s ODE solving capability.

Here’s the Stan

functions {
real[] pendulum(real t,
real[] y,
real[] theta,
real[] x_r,
int[] x_i) {
real dydt[2];
dydt[1] <- y[2];
dydt[2] <- - theta[1] * sin(y[1]);
return dydt;
}
}
data {
int<lower=1> T;
real y0[2];
real t0;
real ts[T];
real theta[1];
real sigma[2];
}
transformed data {
real x_r[0];
int x_i[0];
}
model {
}
generated quantities {
real y_hat[T,2];
y_hat <- integrate_ode(pendulum, y0, t0, ts, theta, x_r, x_i);
for (t in 1:T) {
y_hat[t,1] <- y_hat[t,1] + normal_rng(0,sigma[1]);
y_hat[t,2] <- y_hat[t,2] + normal_rng(0,sigma[2]);
}
}

And here’s the R to invoke it

library(rstan)
library(mvtnorm)

qc1 = 0.0001
deltaT = 0.01
nSamples = 100
m0 = c(1.6, 0)
g = 9.81
t0 = 0.0
ts = seq(deltaT,nSamples * deltaT,deltaT)

bigQ = matrix(c(qc1 * deltaT^3 / 3, qc1 * deltaT^2 / 2,
qc1 * deltaT^2 / 2,       qc1 * deltaT
),
nrow = 2,
ncol = 2,
byrow = TRUE
)

samples <- stan(file = 'Pendulum.stan',
data = list (
T  = nSamples,
y0 = m0,
t0 = t0,
ts = ts,
theta = array(g, dim = 1),
sigma = c(bigQ[1,1], bigQ[2,2]),
refresh = -1
),
algorithm="Fixed_param",
seed = 42,
chains = 1,
iter =1
)

We can plot the angle the pendulum subtends to the vertical over time. Note that this is not very noisy.

s <- extract(samples,permuted=FALSE)
plot(s[1,1,1:100])

Now let us suppose that we don’t know the value of $g$ and we can only observe the horizontal displacement.

zStan <- sin(s[1,1,1:nSamples])

Now we can use Stan to infer the value of $g$.

functions {
real[] pendulum(real t,
real[] y,
real[] theta,
real[] x_r,
int[] x_i) {
real dydt[2];
dydt[1] <- y[2];
dydt[2] <- - theta[1] * sin(y[1]);
return dydt;
}
}
data {
int<lower=1> T;
real y0[2];
real z[T];
real t0;
real ts[T];
}
transformed data {
real x_r[0];
int x_i[0];
}
parameters {
real theta[1];
vector<lower=0>[1] sigma;
}
model {
real y_hat[T,2];
real z_hat[T];
theta ~ normal(0,1);
sigma ~ cauchy(0,2.5);
y_hat <- integrate_ode(pendulum, y0, t0, ts, theta, x_r, x_i);
for (t in 1:T) {
z_hat[t] <- sin(y_hat[t,1]);
z[t] ~ normal(z_hat[t], sigma);
}
}

Here’s the R to invoke it.

estimates <- stan(file = 'PendulumInfer.stan',
data = list (
T  = nSamples,
y0 = m0,
z  = zStan,
t0 = t0,
ts = ts
),
seed = 42,
chains = 1,
iter = 1000,
warmup = 500,
refresh = -1
)
e <- extract(estimates,pars=c("theta[1]","sigma[1]","lp__"),permuted=TRUE)

This gives an estiamted valeu for $g$ of 9.809999 which is what we would hope.

Now let’s try adding some noise to our observations.

set.seed(42)
epsilons <- rmvnorm(n=nSamples,mean=c(0.0),sigma=bigR)

zStanNoisy <- sin(s[1,1,1:nSamples] + epsilons[,1])

estimatesNoisy <- stan(file = 'PendulumInfer.stan',
data = list (T  = nSamples,
y0 = m0,
z  = zStanNoisy,
t0 = t0,
ts = ts
),
seed = 42,
chains = 1,
iter = 1000,
warmup = 500,
refresh = -1
)
eNoisy <- extract(estimatesNoisy,pars=c("theta[1]","sigma[1]","lp__"),permuted=TRUE)

This gives an estiamted value for $g$ of 8.5871024 which is ok but not great.

# Postamble

library(knitr)
knit('Pendulum.Rmd')

And this from command line:

pandoc -s Pendulum.md --filter=./Include > PendulumExpanded.html

# Introduction

Let $\{X_t\}_{t \geq 1}$ be a (hidden) Markov process. By hidden, we mean that we are not able to observe it.

$\displaystyle X_1 \sim \mu(\centerdot) \quad X_t \,|\, (X_{t-1} = x) \sim f(\centerdot \,|\, x)$

And let $\{Y_t\}_{t \geq 1}$ be an observable Markov process such that

$\displaystyle Y_t \,|\, (X_{t} = x) \sim g(\centerdot \,|\, x)$

That is the observations are conditionally independent given the state of the hidden process.

As an example let us take the one given in Särkkä (2013) where the movement of a car is given by Newton’s laws of motion and the acceleration is modelled as white noise.

\displaystyle \begin{aligned} X_t &= AX_{t-1} + Q \epsilon_t \\ Y_t &= HX_t + R \eta_t \end{aligned}

Although we do not do so here, $A, Q, H$ and $R$ can be derived from the dynamics. For the purpose of this blog post, we note that they are given by

$\displaystyle A = \begin{bmatrix} 1 & 0 & \Delta t & 0 \\ 0 & 1 & 0 & \Delta t \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} , \quad Q = \begin{bmatrix} \frac{q_1^c \Delta t^3}{3} & 0 & \frac{q_1^c \Delta t^2}{2} & 0 \\ 0 & \frac{q_2^c \Delta t^3}{3} & 0 & \frac{q_2^c \Delta t^2}{2} \\ \frac{q_1^c \Delta t^2}{2} & 0 & {q_1^c \Delta t} & 0 \\ 0 & \frac{q_2^c \Delta t^2}{2} & 0 & {q_2^c \Delta t} \end{bmatrix}$

and

$\displaystyle H = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix} , \quad R = \begin{bmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \end{bmatrix}$

We wish to determine the position and velocity of the car given noisy observations of the position. In general we need the distribution of the hidden path given the observable path. We use the notation $x_{m:n}$ to mean the path of $x$ starting a $m$ and finishing at $n$.

$\displaystyle p(x_{1:n} \,|\, y_{1:n}) = \frac{p(x_{1:n}, y_{1:n})}{p(y_{1:n})}$

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}

> {-# LANGUAGE FlexibleInstances            #-}
> {-# LANGUAGE MultiParamTypeClasses        #-}
> {-# LANGUAGE FlexibleContexts             #-}
> {-# LANGUAGE TypeFamilies                 #-}
> {-# LANGUAGE BangPatterns                 #-}
> {-# LANGUAGE GeneralizedNewtypeDeriving   #-}
> {-# LANGUAGE ScopedTypeVariables          #-}

> module ParticleSmoothing
>   ( simpleSamples
>   , carSamples
>   , testCar
>   , testSimple
>   ) where

> import Data.Random.Source.PureMT
> import Data.Random hiding ( StdNormal, Normal )
> import qualified Data.Random as R
> import Control.Monad.Writer hiding ( Any, All )
> import qualified Numeric.LinearAlgebra.HMatrix as H
> import Foreign.Storable ( Storable )
> import Data.Maybe ( fromJust )
> import Data.Bits ( shiftR )
> import qualified Data.Vector as V
> import qualified Data.Vector.Unboxed as U
> import System.Random.MWC

> import Data.Array.Repa ( Z(..), (:.)(..), Any(..), computeP,
>                          extent, DIM1, DIM2, slice, All(..)
>                        )
> import qualified Data.Array.Repa as Repa

> import qualified Control.Monad.Loops as ML

> import PrettyPrint ()
> import Text.PrettyPrint.HughesPJClass ( Pretty, pPrint )

> import Data.Vector.Unboxed.Deriving


# Some Theory

If we could sample $X_{1:n}^{(i)} \sim p(x_{1:n} \,|\, y_{1:n})$ then we could approximate the posterior as

$\displaystyle \hat{p}(x_{1:n} \,|\, y_{1:n}) = \frac{1}{N}\sum_{i=1}^N \delta_{X_{1:n}^{(i)}}(x_{1:n})$

If we wish to, we can create marginal estimates

$\displaystyle \hat{p}(x_k \,|\, y_{1:n}) = \frac{1}{N}\sum_{i=1}^N \delta_{X_{k}^{(i)}}(x_{k})$

When $k = N$, this is the filtering estimate.

## Standard Bayesian Recursion

Prediction

\displaystyle \begin{aligned} p(x_n \,|\, y_{1:n-1}) &= \int p(x_{n-1:n} \,|\, y_{1:n-1}) \,\mathrm{d}x_{n-1} \\ &= \int p(x_{n} \,|\, x_{n-1}, y_{1:n-1}) \, p(x_{n-1} \,|\, y_{1:n-1}) \,\mathrm{d}x_{n-1} \\ &= \int f(x_{n} \,|\, x_{n-1}) \, p(x_{n-1} \,|\, y_{1:n-1}) \,\mathrm{d}x_{n-1} \\ \end{aligned}

Update

\displaystyle \begin{aligned} p(x_n \,|\, y_{1:n}) &= \frac{p(y_n \,|\, x_n, y_{1:n-1}) \, p(x_n \,|\, y_{1:n-1})} {p(y_n \,|\, y_{1:n-1})} \\ &= \frac{g(y_n \,|\, x_n) \, p(x_n \,|\, y_{1:n-1})} {p(y_n \,|\, y_{1:n-1})} \end{aligned}

where by definition

$\displaystyle {p(y_n \,|\, y_{1:n-1})} = \int {g(y_n \,|\, x_n) \, p(x_n \,|\, y_{1:n-1})} \,\mathrm{d}x_n$

## Path Space Recursion

We have

\displaystyle \begin{aligned} p(x_{1:n} \,|\, y_{1:n}) &= \frac{p(x_{1:n}, y_{1:n})}{p(y_{1:n})} \\ &= \frac{p(x_n, y_n \,|\, x_{1:n-1}, y_{1:n-1})}{p(y_{1:n})} \, p(x_{1:n-1}, y_{1:n-1}) \\ &= \frac{p(y_n \,|\, x_{1:n}, y_{1:n-1}) \, p(x_n \,|\, x_{1:n-1}, y_{1:n-1}) }{p(y_{1:n})} \, p(x_{1:n-1}, y_{1:n-1}) \\ &= \frac{g(y_n \,|\, x_{n}) \, f(x_n \,|\, x_{n-1})}{p(y_n \,|\, y_{1:n-1})} \, \frac{p(x_{1:n-1}, y_{1:n-1})}{ \, p(y_{1:n-1})} \\ &= \frac{g(y_n \,|\, x_{n}) \, f(x_n \,|\, x_{n-1})}{p(y_n \,|\, y_{1:n-1})} \, {p(x_{1:n-1} \,|\,y_{1:n-1})} \\ &= \frac{g(y_n \,|\, x_{n}) \, \overbrace{f(x_n \,|\, x_{n-1}) \, {p(x_{1:n-1} \,|\,y_{1:n-1})}}^{\mathrm{predictive}\,p(x_{1:n} \,|\, y_{1:n-1})}} {p(y_n \,|\, y_{1:n-1})} \\ \end{aligned}

where by definition

$\displaystyle p(y_n \,|\, y_{1:n-1}) = \int g(y_n \,|\, x_n) \, p(x_{1:n} \,|\, y_{1:n-1}) \,\mathrm{d}x_{1:n}$

Prediction

$\displaystyle p(x_{1:n} \,|\, y_{1:n-1}) = f(x_n \,|\, x_{n-1}) \, {p(x_{1:n-1} \,|\,y_{1:n-1})}$

Update

$\displaystyle p(x_{1:n} \,|\, y_{1:n}) = \frac{g(y_n \,|\, x_{n}) \, {p(x_{1:n} \,|\, y_{1:n-1})}} {p(y_n \,|\, y_{1:n-1})}$

## Algorithm

The idea is to simulate paths using the recursion we derived above.

At time $n-1$ we have an approximating distribution

$\displaystyle \hat{p}(x_{1:n-1} \,|\, y_{1:n-1}) = \frac{1}{N}\sum_{i=1}^N \delta_{X_{1:n-1}}^{(i)}(x_{1:n-1})$

Sample $\tilde{X}_n^{(i)} \sim f(\centerdot \,|\, X_{n-1}^{(i)})$ and set $\tilde{X}_{1:n}^{(i)} = (\tilde{X}_{1:n-1}^{(i)}, \tilde{X}_n^{(i)})$. We then have an approximation of the prediction step

$\displaystyle \hat{p}(x_{1:n} \,|\, y_{1:n-1}) = \frac{1}{N}\sum_{i=1}^N \delta_{\tilde{X}_{1:n}}^{(i)}(x_{1:n})$

Substituting

\displaystyle \begin{aligned} {\hat{p}(y_n \,|\, y_{1:n-1})} &= \int {g(y_n \,|\, x_n) \, \hat{p}(x_{1:n} \,|\, y_{1:n-1})} \,\mathrm{d}x_n \\ &= \int {g(y_n \,|\, x_n)}\frac{1}{N}\sum_{i=1}^N \delta_{\tilde{X}_{1:n-1}}^{(i)}(x_{1:n}) \,\mathrm{d}x_n \\ &= \frac{1}{N}\sum_{i=1}^N {g(y_n \,|\, \tilde{X}_n^{(i)})} \end{aligned}

and again

\displaystyle \begin{aligned} \tilde{p}(x_{1:n} \,|\, y_{1:n}) &= \frac{g(y_n \,|\, x_{n}) \, {\hat{p}(x_{1:n} \,|\, y_{1:n-1})}} {\hat{p}(y_n \,|\, y_{1:n-1})} \\ &= \frac{g(y_n \,|\, x_{n}) \, \frac{1}{N}\sum_{i=1}^N \delta_{\tilde{X}_{1:n}}^{(i)}(x_{1:n})} {\frac{1}{N}\sum_{i=1}^N {g(y_n \,|\, \tilde{X}_n^{(i)})}} \\ &= \frac{ \sum_{i=1}^N g(y_n \,|\, \tilde{X}_n^{(i)}) \, \delta_{\tilde{X}_{1:n}}^{(i)}(x_{1:n})} {\sum_{i=1}^N {g(y_n \,|\, \tilde{X}_n^{(i)})}} \\ &= \sum_{i=1}^N W_n^{(i)} \delta_{\tilde{X}_{1:n}^{(i)}} (x_{1:n}) \end{aligned}

where $W_n^{(i)} \propto g(y_n \,|\, \tilde{X}_n^{(i)})$ and $\sum_{i=1}^N W_n^{(i)} = 1$.

Now sample

$\displaystyle X_{1:n}^{(i)} \sim \tilde{p}(x_{1:n} \,|\, y_{1:n})$

Let’s specify some values for the example of the car moving in two dimensions.

> deltaT, sigma1, sigma2, qc1, qc2 :: Double
> deltaT = 0.1
> sigma1 = 1/2
> sigma2 = 1/2
> qc1 = 1
> qc2 = 1

> bigA :: H.Matrix Double
> bigA = (4 H.>< 4) bigAl

> bigAl :: [Double]
> bigAl = [1, 0 , deltaT,      0,
>          0, 1,       0, deltaT,
>          0, 0,       1,      0,
>          0, 0,       0,      1]

> bigQ :: H.Herm Double
> bigQ = H.trustSym $(4 H.>< 4) bigQl  > bigQl :: [Double] > bigQl = [qc1 * deltaT^3 / 3, 0, qc1 * deltaT^2 / 2, 0, > 0, qc2 * deltaT^3 / 3, 0, qc2 * deltaT^2 / 2, > qc1 * deltaT^2 / 2, 0, qc1 * deltaT, 0, > 0, qc2 * deltaT^2 / 2, 0, qc2 * deltaT]  > bigH :: H.Matrix Double > bigH = (2 H.>< 4) [1, 0, 0, 0, > 0, 1, 0, 0]  > bigR :: H.Herm Double > bigR = H.trustSym$ (2 H.>< 2) [sigma1^2,        0,
>                                        0, sigma2^2]

> m0 :: H.Vector Double
> m0 = H.fromList [0, 0, 1, -1]

> bigP0 :: H.Herm Double
> bigP0 = H.trustSym $H.ident 4  > n :: Int > n = 23  With these we generate hidden and observable sample path. > carSample :: MonadRandom m => > H.Vector Double -> > m (Maybe ((H.Vector Double, H.Vector Double), H.Vector Double)) > carSample xPrev = do > xNew <- sample$ rvar (Normal (bigA H.#> xPrev) bigQ)
>   yNew <- sample $rvar (Normal (bigH H.#> xNew) bigR) > return$ Just ((xNew, yNew), xNew)

> carSamples :: [(H.Vector Double, H.Vector Double)]
> carSamples = evalState (ML.unfoldrM carSample m0) (pureMT 17)


We can plot an example trajectory for the car and the noisy observations that are available to the smoother / filter.

Sadly there is no equivalent to numpy in Haskell. Random number packages generate vectors, for multi-rank arrays there is repa and for fast matrix manipulation there is hmtatrix. Thus for our single step path update function, we have to pass in functions to perform type conversion. Clearly with all the copying inherent in this approach, performance is not going to be great.

The type synonym ArraySmoothing is used to denote the cloud of particles at each time step.

> type ArraySmoothing = Repa.Array Repa.U DIM2

> singleStep :: forall a . U.Unbox a =>
>               (a -> H.Vector Double) ->
>               (H.Vector Double -> a) ->
>               H.Matrix Double ->
>               H.Herm Double ->
>               H.Matrix Double ->
>               H.Herm Double ->
>               ArraySmoothing a -> H.Vector Double ->
>               WriterT [ArraySmoothing a] (StateT PureMT IO) (ArraySmoothing a)
> singleStep f g bigA bigQ bigH bigR x y = do
>   tell[x]
>   let (Z :. ix :. jx) = extent x
>
>   xHatR <- lift $computeP$ Repa.slice x (Any :. jx - 1)
>   let xHatH = map f $Repa.toList (xHatR :: Repa.Array Repa.U DIM1 a) > xTildeNextH <- lift$ mapM (\x -> sample $rvar (Normal (bigA H.#> x) bigQ)) xHatH > > let xTildeNextR = Repa.fromListUnboxed (Z :. ix :. (1 :: Int))$
>                     map g xTildeNextH
>       xTilde = Repa.append x xTildeNextR
>
>       weights = map (normalPdf y bigR) $> map (bigH H.#>) xTildeNextH > vs = runST (create >>= (asGenST$ \gen -> uniformVector gen n))
>       cumSumWeights = V.scanl (+) 0 (V.fromList weights)
>       totWeight = sum weights
>       js = indices (V.map (/ totWeight) $V.tail cumSumWeights) vs > xNewV = V.map (\j -> Repa.transpose$
>                            Repa.reshape (Z :. (1 :: Int) :. jx + 1) $> slice xTilde (Any :. j :. All)) js > xNewR = Repa.transpose$ V.foldr Repa.append (xNewV V.! 0) (V.tail xNewV)
>   computeP xNewR


The state for the car is a 4-tuple.

> data SystemState a = SystemState { xPos  :: a
>                                  , yPos  :: a
>                                  , _xSpd :: a
>                                  , _ySpd :: a
>                                  }


We initialize the smoother from some prior distribution.

> initCar :: StateT PureMT IO (ArraySmoothing (SystemState Double))
> initCar = do
>   xTilde1 <- replicateM n $sample$ rvar (Normal m0 bigP0)
>   let weights = map (normalPdf (snd $head carSamples) bigR)$
>                 map (bigH H.#>) xTilde1
>       vs = runST (create >>= (asGenST $\gen -> uniformVector gen n)) > cumSumWeights = V.scanl (+) 0 (V.fromList weights) > js = indices (V.tail cumSumWeights) vs > xHat1 = Repa.fromListUnboxed (Z :. n :. (1 :: Int))$
>               map ((\[a,b,c,d] -> SystemState a b c d) . H.toList) $> V.toList$
>               V.map ((V.fromList xTilde1) V.!) js
>   return xHat1


Now we can run the smoother.

> smootherCar :: StateT PureMT IO
>             (ArraySmoothing (SystemState Double)
>             , [ArraySmoothing (SystemState Double)])
> smootherCar = runWriterT $do > xHat1 <- lift initCar > foldM (singleStep f g bigA bigQ bigH bigR) xHat1 (take 100$ map snd $tail carSamples)  > f :: SystemState Double -> H.Vector Double > f (SystemState a b c d) = H.fromList [a, b, c, d]  > g :: H.Vector Double -> SystemState Double > g = (\[a,b,c,d] -> (SystemState a b c d)) . H.toList  And create inferred positions for the car which we then plot. > testCar :: IO ([Double], [Double]) > testCar = do > states <- snd <$> evalStateT smootherCar (pureMT 24)
>   let xs :: [Repa.Array Repa.D DIM2 Double]
>       xs = map (Repa.map xPos) states
>   sumXs :: [Repa.Array Repa.U DIM1 Double] <- mapM Repa.sumP (map Repa.transpose xs)
>   let ixs = map extent sumXs
>       sumLastXs = map (* (recip $fromIntegral n))$
>                   zipWith (Repa.!) sumXs (map (\(Z :. x) -> Z :. (x - 1)) ixs)
>   let ys :: [Repa.Array Repa.D DIM2 Double]
>       ys = map (Repa.map yPos) states
>   sumYs :: [Repa.Array Repa.U DIM1 Double] <- mapM Repa.sumP (map Repa.transpose ys)
>   let ixsY = map extent sumYs
>       sumLastYs = map (* (recip $fromIntegral n))$
>                   zipWith (Repa.!) sumYs (map (\(Z :. x) -> Z :. (x - 1)) ixsY)
>   return (sumLastXs, sumLastYs)


So it seems our smoother does quite well at inferring the state at the latest observation, that is, when it is working as a filter. But what about estimates for earlier times? We should do better as we have observations in the past and in the future. Let’s try with a simpler example and a smaller number of particles.

First we create some samples for our simple 1 dimensional linear Gaussian model.

> bigA1, bigQ1, bigR1, bigH1 :: Double
> bigA1 = 0.5
> bigQ1 = 0.1
> bigR1 = 0.1
> bigH1 = 1.0

> simpleSample :: MonadRandom m =>
>               Double ->
>               m (Maybe ((Double, Double), Double))
> simpleSample xPrev = do
>   xNew <- sample $rvar (R.Normal (bigA1 * xPrev) bigQ1) > yNew <- sample$ rvar (R.Normal (bigH1 * xNew) bigR1)
>   return $Just ((xNew, yNew), xNew)  > simpleSamples :: [(Double, Double)] > simpleSamples = evalState (ML.unfoldrM simpleSample 0.0) (pureMT 17)  Again create a prior. > initSimple :: MonadRandom m => m (ArraySmoothing Double) > initSimple = do > let y = snd$ head simpleSamples
>   xTilde1 <- replicateM n $sample$ rvar $R.Normal y bigR1 > let weights = map (pdf (R.Normal y bigR1))$
>                 map (bigH1 *) xTilde1
>       totWeight = sum weights
>       vs = runST (create >>= (asGenST $\gen -> uniformVector gen n)) > cumSumWeights = V.scanl (+) 0 (V.fromList$ map (/ totWeight) weights)
>       js = indices (V.tail cumSumWeights) vs
>       xHat1 = Repa.fromListUnboxed (Z :. n :. (1 :: Int)) $> V.toList$
>               V.map ((V.fromList xTilde1) V.!) js
>   return xHat1


Now we can run the smoother.

> smootherSimple :: StateT PureMT IO (ArraySmoothing Double, [ArraySmoothing Double])
> smootherSimple = runWriterT $do > xHat1 <- lift initSimple > foldM (singleStep f1 g1 ((1 H.>< 1) [bigA1]) (H.trustSym$ (1 H.>< 1) [bigQ1^2])
>                           ((1 H.>< 1) [bigH1]) (H.trustSym $(1 H.>< 1) [bigR1^2])) > xHat1 > (take 20$ map H.fromList $map return . map snd$ tail simpleSamples)

> f1 :: Double -> H.Vector Double
> f1 a = H.fromList [a]

> g1 :: H.Vector Double -> Double
> g1 = (\[a] -> a) . H.toList


And finally we can look at the paths not just the means of the marginal distributions at the latest observation time.

> testSimple :: IO [[Double]]
> testSimple = do
>   states <- snd <$> evalStateT smootherSimple (pureMT 24) > let path :: Int -> IO (Repa.Array Repa.U DIM1 Double) > path i = computeP$ Repa.slice (last states) (Any :. i :. All)
>   paths <- mapM path [0..n - 1]
>   return $map Repa.toList paths  We can see that at some point in the past all the current particles have one ancestor. The marginals of the smoothing distribution (at some point in the past) have collapsed on to one particle. # Notes ## Helpers for the Inverse CDF That these are helpers for the inverse CDF is delayed to another blog post. > indices :: V.Vector Double -> V.Vector Double -> V.Vector Int > indices bs xs = V.map (binarySearch bs) xs  > binarySearch :: Ord a => > V.Vector a -> a -> Int > binarySearch vec x = loop 0 (V.length vec - 1) > where > loop !l !u > | u <= l = l > | otherwise = let e = vec V.! k in if x <= e then loop l k else loop (k+1) u > where k = l + (u - l) shiftR 1  ## Multivariate Normal The random-fu package does not contain a sampler or pdf for a multivariate normal so we create our own. This should be added to random-fu-multivariate package or something similar. > normalMultivariate :: H.Vector Double -> H.Herm Double -> RVarT m (H.Vector Double) > normalMultivariate mu bigSigma = do > z <- replicateM (H.size mu) (rvarT R.StdNormal) > return$ mu + bigA H.#> (H.fromList z)
>   where
>     (vals, bigU) = H.eigSH bigSigma
>     lSqrt = H.diag $H.cmap sqrt vals > bigA = bigU H.<> lSqrt  > data family Normal k :: *  > data instance Normal (H.Vector Double) = Normal (H.Vector Double) (H.Herm Double)  > instance Distribution Normal (H.Vector Double) where > rvar (Normal m s) = normalMultivariate m s  > normalPdf :: (H.Numeric a, H.Field a, H.Indexable (H.Vector a) a, Num (H.Vector a)) => > H.Vector a -> H.Herm a -> H.Vector a -> a > normalPdf mu sigma x = exp$ normalLogPdf mu sigma x

> normalLogPdf :: (H.Numeric a, H.Field a, H.Indexable (H.Vector a) a, Num (H.Vector a)) =>
>                  H.Vector a -> H.Herm a -> H.Vector a -> a
> normalLogPdf mu bigSigma x = - H.sumElements (H.cmap log (diagonals dec))
>                               - 0.5 * (fromIntegral (H.size mu)) * log (2 * pi)
>                               - 0.5 * s
>   where
>     dec = fromJust $H.mbChol bigSigma > t = fromJust$ H.linearSolve (H.tr dec) (H.asColumn x - mu) > u = H.cmap (\x -> x * x) t > s = H.sumElements u  > diagonals :: (Storable a, H.Element t, H.Indexable (H.Vector t) a) => > H.Matrix t -> H.Vector a > diagonals m = H.fromList (map (\i -> m H.! i H.! i) [0..n-1]) > where > n = max (H.rows m) (H.cols m)  > instance PDF Normal (H.Vector Double) where > pdf (Normal m s) = normalPdf m s > logPdf (Normal m s) = normalLogPdf m s  ## Misellaneous > derivingUnbox "SystemState" > [t| forall a . (U.Unbox a) => SystemState a -> (a, a, a, a) |] > [| \ (SystemState x y xdot ydot) -> (x, y, xdot, ydot) |] > [| \ (x, y, xdot, ydot) -> SystemState x y xdot ydot |]  > instance Pretty a => Pretty (SystemState a) where > pPrint (SystemState x y xdot ydot ) = pPrint (x, y, xdot, ydot)  # Bibliography Särkkä, Simo. 2013. Bayesian Filtering and Smoothing. New York, NY, USA: Cambridge University Press. # Population Growth Estimation via Hamiltonian Monte Carlo Here’s the same analysis of estimating population growth using Stan. data { int<lower=0> N; // number of observations vector[N] y; // observed population } parameters { real r; } model { real k; real p0; real deltaT; real sigma; real mu0; real sigma0; vector[N] p; k <- 1.0; p0 <- 0.1; deltaT <- 0.0005; sigma <- 0.01; mu0 <- 5; sigma0 <- 10; r ~ normal(mu0, sigma0); for (n in 1:N) { p[n] <- k * p0 * exp((n - 1) * r * deltaT) / (k + p0 * (exp((n - 1) * r * deltaT) - 1)); y[n] ~ normal(p[n], sigma); } } Empirically, by looking at the posterior, this seems to do a better job than either extended Kalman or vanilla Metropolis. # Population Growth Estimation via Markov Chain Monte Carlo # Introduction Let us see if we can estimate the parameter for population growth using MCMC in the example in which we used Kalman filtering. We recall the model. \displaystyle \begin{aligned} \dot{p} & = rp\Big(1 - \frac{p}{k}\Big) \end{aligned} $\displaystyle p = \frac{kp_0\exp rt}{k + p_0(\exp rt - 1)}$ And we are allowed to sample at regular intervals \displaystyle \begin{aligned} p_i &= \frac{kp_0\exp r\Delta T i}{k + p_0(\exp r\Delta T i - 1)} \\ y_i &= p_i + \epsilon_i \end{aligned} In other words $y_i \sim {\cal{N}}(p_i, \sigma^2)$, where $\sigma$ is known so the likelihood is $\displaystyle p(y\,|\,r) \propto \prod_{i=1}^n \exp{\bigg( -\frac{(y_i - p_i)^2}{2\sigma^2}\bigg)} = \exp{\bigg( -\sum_{i=1}^n \frac{(y_i - p_i)^2}{2\sigma^2}\bigg)}$ Let us assume a prior of $r \sim {\cal{N}}(\mu_0,\sigma_0^2)$ then the posterior becomes $\displaystyle p(r\,|\,y) \propto \exp{\bigg( -\frac{(r - \mu_0)^2}{2\sigma_0^2} \bigg)} \exp{\bigg( -\sum_{i=1}^n \frac{(y_i - p_i)^2}{2\sigma^2}\bigg)}$ ## Preamble > {-# OPTIONS_GHC -Wall #-} > {-# OPTIONS_GHC -fno-warn-name-shadowing #-} > {-# OPTIONS_GHC -fno-warn-type-defaults #-} > {-# OPTIONS_GHC -fno-warn-unused-do-bind #-} > {-# OPTIONS_GHC -fno-warn-missing-methods #-} > {-# OPTIONS_GHC -fno-warn-orphans #-}  > {-# LANGUAGE NoMonomorphismRestriction #-}  > module PopGrowthMCMC where > > import qualified Data.Vector.Unboxed as V > import Data.Random.Source.PureMT > import Data.Random > import Control.Monad.State > import Data.Histogram.Fill > import Data.Histogram.Generic ( Histogram )  # Implementation We assume most of the parameters are known with the exception of the the growth rate $r$. We fix this also in order to generate test data. > k, p0 :: Double > k = 1.0 > p0 = 0.1  > r, deltaT :: Double > r = 10.0 > deltaT = 0.0005  > nObs :: Int > nObs = 150  Here’s the implementation of the logistic function > logit :: Double -> Double -> Double -> Double > logit p0 k x = k * p0 * (exp x) / (k + p0 * (exp x - 1))  Let us create some noisy data. > sigma :: Double > sigma = 1e-2  > samples :: [Double] > samples = zipWith (+) mus epsilons > where > mus = map (logit p0 k . (* (r * deltaT))) (map fromIntegral [0..]) > epsilons = evalState (sample replicateM nObs $rvar (Normal 0.0 sigma)) (pureMT 3)  Arbitrarily let us set the prior to have a rather vague normal distribution. > mu0, sigma0 :: Double > mu0 = 5.0 > sigma0 = 1e1  > prior :: Double -> Double > prior r = exp (-(r - mu0)**2 / (2 * sigma0**2))  > likelihood :: Double -> [Double] -> Double > likelihood r ys = exp (-sum (zipWith (\y mu -> (y - mu)**2 / (2 * sigma**2)) ys mus)) > where > mus :: [Double] > mus = map (logit p0 k . (* (r * deltaT))) (map fromIntegral [0..])  > posterior :: Double -> [Double] -> Double > posterior r ys = likelihood r ys * prior r  The Metropolis algorithm tells us that we always jump to a better place but only sometimes jump to a worse place. We count the number of acceptances as we go. > acceptanceProb' :: Double -> Double -> [Double] -> Double > acceptanceProb' r r' ys = min 1.0 ((posterior r' ys) / (posterior r ys))  > oneStep :: (Double, Int) -> (Double, Double) -> (Double, Int) > oneStep (r, nAccs) (proposedJump, acceptOrReject) = > if acceptOrReject < acceptanceProb' r (r + proposedJump) samples > then (r + proposedJump, nAccs + 1) > else (r, nAccs)  Here are our proposals. > normalisedProposals :: Int -> Double -> Int -> [Double] > normalisedProposals seed sigma nIters = > evalState (replicateM nIters (sample (Normal 0.0 sigma))) > (pureMT$ fromIntegral seed)


We also need samples from the uniform distribution

> acceptOrRejects :: Int -> Int -> [Double]
> acceptOrRejects seed nIters =
>   evalState (replicateM nIters (sample stdUniform))
>   (pureMT $fromIntegral seed)  Now we can actually run our simulation. We set the number of jumps and a burn in but do not do any thinning. > nIters, burnIn :: Int > nIters = 100000 > burnIn = nIters div 10  Let us start our chain to the mean of the prior. In theory this shoudn’t matter as by the time we have burnt in we should be sampling in the high density region of the distribution. > startMu :: Double > startMu = 5.0  This jump size should allow us to sample the region of high density at a reasonable granularity. > jumpVar :: Double > jumpVar = 0.01  Now we can test our MCMC implementation. > test :: Int -> [(Double, Int)] > test seed = > drop burnIn$
>   scanl oneStep (startMu, 0) $> zip (normalisedProposals seed jumpVar nIters) > (acceptOrRejects seed nIters)  We put the data into a histogram. > numBins :: Int > numBins = 400  > hb :: HBuilder Double (Data.Histogram.Generic.Histogram V.Vector BinD Double) > hb = forceDouble -<< mkSimple (binD lower numBins upper) > where > lower = r - 0.5 * sigma0 > upper = r + 0.5 * sigma0 > > hist :: Int -> Histogram V.Vector BinD Double > hist seed = fillBuilder hb (map fst$ test seed)


With 50 observations we don’t seem to be very certain about the growth rate.

With 100 observations we do very much better.

And with 150 observations we do even better.

# Introduction

Simple models for e.g. financial option pricing assume that the volatility of an index or a stock is constant, see here for example. However, simple observation of time series show that this is not the case; if it were then the log returns would be white noise

One approach which addresses this, GARCH (Generalised AutoRegressive Conditional Heteroskedasticity), models the evolution of volatility deterministically.

Stochastic volatility models treat the volatility of a return on an asset, such as an option to buy a security, as a Hidden Markov Model (HMM). Typically, the observable data consist of noisy mean-corrected returns on an underlying asset at equally spaced time points.

There is evidence that Stochastic Volatility models (Kim, Shephard, and Chib (1998)) offer increased flexibility over the GARCH family, e.g. see Geweke (1994), Fridman and Harris (1998) and Jacquier, Polson, and Rossi (1994). Despite this and judging by the numbers of questions on the R Special Interest Group on Finance mailing list, the use of GARCH in practice far outweighs that of Stochastic Volatility. Reasons cited are the multiplicity of estimation methods for the latter and the lack of packages (but see here for a recent improvement to the paucity of packages).

In their tutorial on particle filtering, Doucet and Johansen (2011) give an example of stochastic volatility. We save this approach for future blog posts and follow Lopes and Polson and the excellent lecture notes by Hedibert Lopes.

Here’s the model.

\displaystyle \begin{aligned} H_0 &\sim {\mathcal{N}}\left( m_0, C_0\right) \\ H_t &= \mu + \phi H_{t-1} + \tau \eta_t \\ Y_n &= \beta \exp(H_t / 2) \epsilon_n \\ \end{aligned}

We wish to estimate $\mu, \phi, \tau$ and $\boldsymbol{h}$. To do this via a Gibbs sampler we need to sample from

$\displaystyle {p \left( \mu, \phi, \tau \,\vert\, \boldsymbol{h}, \boldsymbol{y} \right)} \quad \text{and} \quad {p \left( \boldsymbol{h} \,\vert\, \mu, \phi, \tau, \boldsymbol{y} \right)}$

> {-# OPTIONS_GHC -Wall                      #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults    #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind   #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods  #-}
> {-# OPTIONS_GHC -fno-warn-orphans          #-}

> {-# LANGUAGE RecursiveDo                   #-}
> {-# LANGUAGE ExplicitForAll                #-}
> {-# LANGUAGE TypeOperators                 #-}
> {-# LANGUAGE TypeFamilies                  #-}
> {-# LANGUAGE ScopedTypeVariables           #-}
> {-# LANGUAGE DataKinds                     #-}
> {-# LANGUAGE FlexibleContexts              #-}

> module StochVol (
>     bigM
>   , bigM0
>   , runMC
>   , ys
>   , vols
>   , expectationTau2
>   , varianceTau2
>   ) where

> import Numeric.LinearAlgebra.HMatrix hiding ( (===), (|||), Element,
>                                               (<>), (#>), inv )
> import qualified Numeric.LinearAlgebra.Static as S
> import Numeric.LinearAlgebra.Static ( (<>) )
> import GHC.TypeLits
> import Data.Proxy
> import Data.Maybe ( fromJust )

> import Data.Random
> import Data.Random.Source.PureMT
> import Control.Monad.Writer hiding ( (<>) )
> import Control.Applicative

> import qualified Data.Vector as V

> inv :: (KnownNat n, (1 <=? n) ~ 'True) => S.Sq n -> S.Sq n
> inv m = fromJust S.linSolve m S.eye  > infixr 8 #> > (#>) :: (KnownNat m, KnownNat n) => S.L m n -> S.R n -> S.R m > (#>) = (S.#>)  > type StatsM a = RVarT (Writer [((Double, Double), Double)]) a  > (|||) :: (KnownNat ((+) r1 r2), KnownNat r2, KnownNat c, KnownNat r1) => > S.L c r1 -> S.L c r2 -> S.L c ((+) r1 r2) > (|||) = (S.¦)  # Marginal Distribution for Parameters Let us take a prior that is standard for linear regression $\displaystyle (\boldsymbol{\theta}, \tau^2) \sim {\mathcal{NIG}}(\boldsymbol{\theta}_0, V_0, \nu_0, s_0^2)$ where $\boldsymbol{\theta} = (\mu, \phi)^\top$ and use standard results for linear regression to obtain the required marginal distribution. That the prior is Normal Inverse Gamma (${\cal{NIG}}$) means \displaystyle \begin{aligned} \boldsymbol{\theta} \, | \, \tau^2 & \sim {\cal{N}}(\boldsymbol{\theta}_0, \tau^2 V_0) \\ \tau^2 & \sim {\cal{IG}}(\nu_0 / 2, \nu_0 s_0^2 / 2) \end{aligned} Standard Bayesian analysis for regression tells us that the (conditional) posterior distribution for $\displaystyle y_i = \beta + \alpha x_i + \epsilon_i$ where the $\{\epsilon_i\}$ are IID normal with variance $\sigma^2$ is given by $\displaystyle {p \left( \alpha, \beta, \eta \,\vert\, \boldsymbol{y}, \boldsymbol{x} \right)} = {\cal{N}}((\alpha, \beta); \mu_n, \sigma^2\Lambda_n^{-1})\,{\cal{IG}}(a_n, b_n)$ with $\displaystyle \Lambda_n = X_n^\top X_n + \Lambda_0$ $\displaystyle \begin{matrix} \mu_n = \Lambda_n^{-1}({X_n}^{\top}{X_n}\hat{\boldsymbol{\beta}}_n + \Lambda_0\mu_0) & \textrm{where} & \hat{\boldsymbol\beta}_n = ({X}_n^{\rm T}{X}_n)^{-1}{X}_n^{\rm T}\boldsymbol{y}_n \end{matrix}$ $\displaystyle \begin{matrix} a_n = \frac{n}{2} + a_0 & \quad & b_n = b_0 + \frac{1}{2}(\boldsymbol{y}^\top\boldsymbol{y} + \boldsymbol{\mu}_0^\top\Lambda_0\boldsymbol{\mu}_0 - \boldsymbol{\mu}_n^\top\Lambda_n\boldsymbol{\mu}_n) \end{matrix}$ ## Recursive Form We can re-write the above recursively. We do not need to for this blog article but it will be required in any future blog article which uses Sequential Monte Carlo techniques. $\displaystyle \Lambda_n = \boldsymbol{x}_n^\top \boldsymbol{x}_n + \Lambda_{n-1}$ Furthermore $\displaystyle \Lambda_{n}\mu_{n} = {X}_{n}^{\rm T}\boldsymbol{y}_{n} + \Lambda_0\mu_0 = {X}_{n-1}^{\rm T}\boldsymbol{y}_{n-1} + \boldsymbol{x}_n^\top y_n + \Lambda_0\mu_0 = \Lambda_{n-1}\mu_{n-1} + \boldsymbol{x}_n^\top y_n$ so we can write $\displaystyle \boldsymbol{\mu}_n = \Lambda_n^{-1}(\Lambda_{n-1}\mu_{n-1} + \boldsymbol{x}_n^\top y_n)$ and $\displaystyle \begin{matrix} a_n = a_{n-1} + \frac{1}{2} & \quad & b_n = b_{n-1} + \frac{1}{2}\big[(y_n - \boldsymbol{\mu}_n^\top \boldsymbol{x}_n)y_n + (\boldsymbol{\mu}_{n-1} - \boldsymbol{\mu}_{n})^\top \Lambda_{n-1}\boldsymbol{\mu}_{n-1}\big] \end{matrix}$ ## Specialising In the case of our model we can specialise the non-recursive equations as $\displaystyle \Lambda_n = \begin{bmatrix} 1 & 1 & \ldots & 1 \\ x_1 & x_2 & \ldots & x_n \end{bmatrix} \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \ldots & \ldots \\ 1 & x_n \end{bmatrix} + \Lambda_0$ $\displaystyle \begin{matrix} \mu_n = (\Lambda_n)^{-1}({X_n}^{\top}{X_n}\hat{\boldsymbol{\beta}}_n + \Lambda_0\mu_0) & \textrm{where} & \hat{\boldsymbol\beta}_n = ({X}_n^{\rm T}{X}_n)^{-1}{X}_n^{\rm T}\boldsymbol{x}_{1:n} \end{matrix}$ $\displaystyle \begin{matrix} a_n = \frac{n}{2} + a_0 & \quad & b_n = b_0 + \frac{1}{2}(\boldsymbol{x}_{1:n}^\top\boldsymbol{x}_{1:n} + \boldsymbol{\mu}_0^\top\Lambda_0\boldsymbol{\mu}_0 - \boldsymbol{\mu}_n^\top\Lambda_n\boldsymbol{\mu}_n) \end{matrix}$ Let’s re-write the notation to fit our model. $\displaystyle \Lambda_n = \begin{bmatrix} 1 & 1 & \ldots & 1 \\ h_1 & h_2 & \ldots & h_n \end{bmatrix} \begin{bmatrix} 1 & h_1 \\ 1 & h_2 \\ \ldots & \ldots \\ 1 & h_n \end{bmatrix} + \Lambda_0$ $\displaystyle \begin{matrix} \mu_n = (\Lambda_n)^{-1}({H_n}^{\top}{H_n}\hat{\boldsymbol{\theta}}_n + \Lambda_0\mu_0) & \textrm{where} & \hat{\boldsymbol\theta}_n = ({H}_n^{\rm T}{H}_n)^{-1}{H}_n^{\rm T}\boldsymbol{x}_{1:n} \end{matrix}$ $\displaystyle \begin{matrix} a_n = \frac{n}{2} + a_0 & \quad & b_n = b_0 + \frac{1}{2}(\boldsymbol{x}_{1:n}^\top\boldsymbol{x}_{1:n} + \boldsymbol{\mu}_0^\top\Lambda_0\boldsymbol{\mu}_0 - \boldsymbol{\mu}_n^\top\Lambda_n\boldsymbol{\mu}_n) \end{matrix}$ Sample from ${p \left( \boldsymbol{\theta}, \tau^2 \,\vert\, \boldsymbol{h}, \boldsymbol{y} \right)} \sim {\mathcal{NIG}}(\boldsymbol{\theta}_1, V_1, \nu_1, s_1^2)$ We can implement this in Haskell as > sampleParms :: > forall n m . > (KnownNat n, (1 <=? n) ~ 'True) => > S.R n -> S.L n 2 -> S.R 2 -> S.Sq 2 -> Double -> Double -> > RVarT m (S.R 2, Double) > sampleParms y bigX theta_0 bigLambda_0 a_0 s_02 = do > let n = natVal (Proxy :: Proxy n) > a_n = 0.5 * (a_0 + fromIntegral n) > bigLambda_n = bigLambda_0 + (tr bigX) <> bigX > invBigLambda_n = inv bigLambda_n > theta_n = invBigLambda_n #> ((tr bigX) #> y + (tr bigLambda_0) #> theta_0) > b_0 = 0.5 * a_0 * s_02 > b_n = b_0 + > 0.5 * (S.extract (S.row y <> S.col y)!0!0) + > 0.5 * (S.extract (S.row theta_0 <> bigLambda_0 <> S.col theta_0)!0!0) - > 0.5 * (S.extract (S.row theta_n <> bigLambda_n <> S.col theta_n)!0!0) > g <- rvarT (Gamma a_n (recip b_n)) > let s2 = recip g > invBigLambda_n' = m <> invBigLambda_n > where > m = S.diag S.vector (replicate 2 s2)
>   m1 <- rvarT StdNormal
>   m2 <- rvarT StdNormal
>   let theta_n' :: S.R 2
>       theta_n' = theta_n + S.chol (S.sym invBigLambda_n') #> (S.vector [m1, m2])
>   return (theta_n', s2)


# Marginal Distribution for State

## Marginal for $H_0$

Using a standard result about conjugate priors and since we have

$\displaystyle h_0 \sim {\cal{N}}(m_0,C_0) \quad h_1 \vert h_0 \sim {\cal{N}}(\mu + \phi h_0, \tau^2)$

we can deduce

$\displaystyle h_0 \vert h_1 \sim {\cal{N}}(m_1,C_1)$

where

\displaystyle \begin{aligned} \frac{1}{C_1} &= \frac{1}{C_0} + \frac{\phi^2}{\tau^2} \\ \frac{m_1}{C_1} &= \frac{m_0}{C_0} + \frac{\phi(h_1 - \mu)}{\tau^2} \end{aligned}

> sampleH0 :: Double ->
>             Double ->
>             V.Vector Double ->
>             Double ->
>             Double ->
>             Double ->
>             RVarT m Double
> sampleH0 iC0 iC0m0 hs mu phi tau2 = do
>   let var = recip (iC0 + phi^2 / tau2) > mean = var * (iC0m0 + phi * ((hs V.! 0) - mu) / tau2) > rvarT (Normal mean (sqrt var))  ## Marginal for $H_1 \ldots H_n$ From the state equation, we have \displaystyle \begin{aligned} H_{t+1} &= \mu + \phi H_{t} + \tau \eta_t \\ \phi^2 H_{t} &= -\phi\mu + \phi H_{t+1} - \phi \tau \eta_t \\ \end{aligned} We also have \displaystyle \begin{aligned} H_{t} &= \mu + \phi H_{t-1} + \tau \eta_{t-1} \\ \end{aligned} Adding the two expressions together gives \displaystyle \begin{aligned} (1 + \phi^2)H_{t} &= \phi (H_{t-1} + H_{t+1}) + \mu (1 - \phi) + \tau(\eta_{t-1} - \phi\eta_t) \\ H_{t} &= \frac{\phi}{1 + \phi^2} (H_{t-1} + H_{t+1}) + \mu \frac{1 - \phi}{1 + \phi^2} + \frac{\tau}{1 + \phi^2}(\eta_{t-1} - \phi\eta_t) \\ \end{aligned} Since $\{\eta_t\}$ are standard normal, then $H_t$ conditional on $H_{t-1}$ and $H_{t+1}$ is normally distributed, and \displaystyle \begin{aligned} \mathbb{E}(H_n\mid H_{n-1}, H_{n+1}) &= \frac{1 - \phi}{1+\phi^2}\mu + \frac{\phi}{1+\phi^2}(H_{n-1} + H_{n+1}) \\ \mathbb{V}(H_n\mid H_{n-1}, H_{n+1}) &= \frac{\tau^2}{1+\phi^2} \end{aligned} We also have $\displaystyle h_{n+1} \vert h_n \sim {\cal{N}}(\mu + \phi h_n, \tau^2)$ Writing $\displaystyle \boldsymbol{h}_{-t} = \begin{bmatrix} h_0, & h_1, & \ldots, & h_{t-1}, & h_{t+1}, & \ldots, & h_{n-1}, & h_n \end{bmatrix}$ by Bayes’ Theorem we have $\displaystyle {p \left( h_t \,\vert\, \boldsymbol{h}_{-t}, \theta, \boldsymbol{y} \right)} \propto {p \left( y_t \,\vert\, h_t \right)} {p \left( h_t \,\vert\, \boldsymbol{h}_{-t}, \theta, y_{-t} \right)} = f_{\cal{N}}(y_t;0,e^{h_t}) f_{\cal{N}}(h_t;\mu_t,\nu_t^2)$ where $f_{\cal{N}}(x;\mu,\sigma^2)$ is the probability density function of a normal distribution. We can sample from this using Metropolis 1. For each $t$, sample $h_t^\flat$ from ${\cal{N}}(h_t, \gamma^2)$ where $\gamma^2$ is the tuning variance. 2. For each $t=1, \ldots, n$, compute the acceptance probability $\displaystyle p_t = \min{\Bigg(\frac{f_{\cal{N}}(h^\flat_t;\mu_t,\nu_t^2) f_{\cal{N}}(y_t;0,e^{h^\flat_t})}{f_{\cal{N}}(h_t;\mu_t,\nu_t^2) f_{\cal{N}}(y_t;0,e^{h_t})}, 1 \Bigg)}$ 1. For each $t$, compute a new value of $h_t$ $\displaystyle h^\sharp_t = \begin{cases} h^\flat_t \text{with probability } p_t \\ h_t \text{with probability } 1 - p_t \end{cases}$ > metropolis :: V.Vector Double -> > Double -> > Double -> > Double -> > Double -> > V.Vector Double -> > Double -> > RVarT m (V.Vector Double) > metropolis ys mu phi tau2 h0 hs vh = do > let eta2s = V.replicate (n-1) (tau2 / (1 + phi^2)) V.snoc tau2 > etas = V.map sqrt eta2s > coef1 = (1 - phi) / (1 + phi^2) * mu > coef2 = phi / (1 + phi^2) > mu_n = mu + phi * (hs V.! (n-1)) > mu_1 = coef1 + coef2 * ((hs V.! 1) + h0) > innerMus = V.zipWith (\hp1 hm1 -> coef1 + coef2 * (hp1 + hm1)) (V.tail (V.tail hs)) hs > mus = mu_1 V.cons innerMus V.snoc mu_n > hs' <- V.mapM (\mu -> rvarT (Normal mu vh)) hs > let num1s = V.zipWith3 (\mu eta h -> logPdf (Normal mu eta) h) mus etas hs' > num2s = V.zipWith (\y h -> logPdf (Normal 0.0 (exp (0.5 * h))) y) ys hs' > nums = V.zipWith (+) num1s num2s > den1s = V.zipWith3 (\mu eta h -> logPdf (Normal mu eta) h) mus etas hs > den2s = V.zipWith (\y h -> logPdf (Normal 0.0 (exp (0.5 * h))) y) ys hs > dens = V.zipWith (+) den1s den2s > us <- V.replicate n <> rvarT StdUniform
>   let ls   = V.zipWith (\n d -> min 0.0 (n - d)) nums dens
>   return $V.zipWith4 (\u l h h' -> if log u < l then h' else h) us ls hs hs'  # Markov Chain Monte Carlo Now we can write down a single step for our Gibbs sampler, sampling from each marginal in turn. > singleStep :: Double -> V.Vector Double -> > (Double, Double, Double, Double, V.Vector Double) -> > StatsM (Double, Double, Double, Double, V.Vector Double) > singleStep vh y (mu, phi, tau2, h0, h) = do > lift$ tell [((mu, phi),tau2)]
>   hNew <- metropolis y mu phi tau2 h0 h vh
>   h0New <- sampleH0 iC0 iC0m0 hNew mu phi tau2
>   let bigX' = (S.col $S.vector$ replicate n 1.0)
>               |||
>               (S.col $S.vector$ V.toList $h0New V.cons V.init hNew) > bigX = bigX' asTypeOf (snd$ valAndType nT)
>   newParms <- sampleParms (S.vector $V.toList h) bigX (S.vector [mu0, phi0]) invBigV0 nu0 s02 > return ( (S.extract (fst newParms))!0 > , (S.extract (fst newParms))!1 > , snd newParms > , h0New > , hNew > )  # Testing Let’s create some test data. > mu', phi', tau2', tau' :: Double > mu' = -0.00645 > phi' = 0.99 > tau2' = 0.15^2 > tau' = sqrt tau2'  We need to create a statically typed matrix with one dimension the same size as the data so we tie the data size value to the required type. > nT :: Proxy 500 > nT = Proxy  > valAndType :: KnownNat n => Proxy n -> (Int, S.L n 2) > valAndType x = (fromIntegral$ natVal x, undefined)

> n :: Int
> n = fst $valAndType nT  Arbitrarily let us start the process at > h0 :: Double > h0 = 0.0  We define the process as a stream (aka co-recursively) using the Haskell recursive do construct. It is not necessary to do this but streams are a natural way to think of stochastic processes. > hs, vols, sds, ys :: V.Vector Double > hs = V.fromList$ take n $fst$ runState hsAux (pureMT 1)
>   where
>     hsAux = mdo { x0 <- sample (Normal (mu' + phi' * h0) tau')
>                 ; xs <- mapM (\x -> sample (Normal (mu' + phi' * x) tau')) (x0:xs)
>                 ; return xs
>                 }

> vols = V.map exp hs


We can plot the volatility (which we cannot observe directly).

And we can plot the log returns.

> sds = V.map sqrt vols

> ys = fst $runState ysAux (pureMT 2) > where > ysAux = V.mapM (\sd -> sample (Normal 0.0 sd)) sds  We start with a vague prior for $H_0$ > m0, c0 :: Double > m0 = 0.0 > c0 = 100.0  For convenience > iC0, iC0m0 :: Double > iC0 = recip c0 > iC0m0 = iC0 * m0  Rather than really sample from priors for $\mu, \phi$ and $\tau^2$ let us cheat and assume we sampled the simulated values! > mu0, phi0, tau20 :: Double > mu0 = -0.00645 > phi0 = 0.99 > tau20 = 0.15^2  But that we are still very uncertain about them > bigV0, invBigV0 :: S.Sq 2 > bigV0 = S.diag$ S.fromList [100.0, 100.0]
> invBigV0 = inv bigV0

> nu0, s02 :: Double
> nu0    = 10.0
> s02    = (nu0 - 2) / nu0 * tau20


Note that for the inverse gamma this gives

> expectationTau2, varianceTau2 :: Double
> expectationTau2 = (nu0 * s02 / 2) / ((nu0 / 2) - 1)
> varianceTau2 = (nu0 * s02 / 2)^2 / (((nu0 / 2) - 1)^2 * ((nu0 / 2) - 2))

ghci> expectationTau2
2.25e-2

ghci> varianceTau2
1.6874999999999998e-4


## Running the Markov Chain

Tuning parameter

> vh :: Double
> vh = 0.1


The burn-in and sample sizes may be too low for actual estimation but will suffice for a demonstration.

> bigM, bigM0 :: Int
> bigM0 = 2000
> bigM  = 2000

> multiStep :: StatsM (Double, Double, Double, Double, V.Vector Double)
> multiStep = iterateM_ (singleStep vh ys) (mu0, phi0, tau20, h0, vols)

> runMC :: [((Double, Double), Double)]
> runMC = take bigM $drop bigM0$
>         execWriter (evalStateT (sample multiStep) (pureMT 42))


And now we can look at the distributions of our estimates

# Bibliography

Doucet, Arnaud, and Adam M Johansen. 2011. “A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later.” In Handbook of Nonlinear Filtering. Oxford, UK: Oxford University Press.

Fridman, Moshe, and Lawrence Harris. 1998. “A Maximum Likelihood Approach for Non-Gaussian Stochastic Volatility Models.” Journal of Business & Economic Statistics 16 (3): 284–91.

Geweke, John. 1994. “Bayesian Comparison of Econometric Models.”

Jacquier, Eric, Nicholas G. Polson, and Peter E. Rossi. 1994. “Bayesian Analysis of Stochastic Volatility Models.”

Kim, Sangjoon, Neil Shephard, and Siddhartha Chib. 1998. “Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models.” Review of Economic Studies 65 (3): 361–93. http://ideas.repec.org/a/bla/restud/v65y1998i3p361-93.html.