# Warming up for NUTS (No U-Turn)

I have been thinking about writing a blog on why the no u-turn sampler (NUTS) works rather than describing the actual algorithm. This led me to look at Jared Tobin’s Haskell implementation. His example tries to explore the Himmelblau function but only finds one local minima. This is not unexpected; as the excellent Stan manual notes

Being able to carry out such invariant inferences in practice is an altogether different matter. It is almost always intractable to find even a single posterior mode, much less balance the exploration of the neighborhoods of multiple local maxima according to the probability masses.

and

For HMC and NUTS, the problem is that the sampler gets stuck in one of the two "bowls" around the modes and cannot gather enough energy from random momentum assignment to move from one mode to another.

rm(list = ls(all.names=TRUE))
unlink(".RData")

rstan::stan_version()
## [1] "2.12.0"
rstan_options(auto_write = TRUE)

On the Rosenbrock function it fares much better.

knitr::include_graphics("RosenbrockA.png")

We can’t (at least I don’t know how to) try Stan out on Rosenbrock as its not a distribution but we can try it out on another nasty problem: the funnel. Some of this is taken directly from the Stan manual.

Here’s the Stan:

parameters {
real y;
vector[9] x;
}
model {
y ~ normal(0,3);
x ~ normal(0,exp(y/2));
}

which we can run with the following R:

funnel_fit <- stan(file='funnel.stan', cores=4, iter=10000)
## Warning: There were 92 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
funnel_samples <- extract(funnel_fit,permuted=TRUE,inc_warmup=FALSE);
funnel_df <- data.frame(x1=funnel_samples$x[,1],y=funnel_samples$y[])

Plotting the data requires some unpleasantness but shows the neck of the funnel does not get explored. So even HMC and NUTS do not perform well.

midpoints <- function(x, dp=2){
lower <- as.numeric(gsub(",.*","",gsub("\$$|\$|\$$|\$","", x)))
upper <- as.numeric(gsub(".*,","",gsub("\$$|\$|\$$|\$","", x)))
return(round(lower+(upper-lower)/2, dp))
}

df <- funnel_df[funnel_df$x1 < 20 & funnel_df$x1 > -20 & funnel_df$y < 9 & funnel_df$y > -9,]
x_c <- cut(df$x1, 20) y_c <- cut(df$y, 20)
z <- table(x_c, y_c)
z_df <- as.data.frame(z)
a_df <- data.frame(x=midpoints(z_df$x_c),y=midpoints(z_df$y_c),f=z_df$Freq) m = as.matrix(dcast(a_df,x ~ y)) ## Using f as value column: use value.var to override. hist3D(x=m[,"x"],y=as.double(colnames(m)[2:21]),z=(as.matrix(dcast(a_df,x ~ y)))[,2:21], border="black",ticktype = "detailed",theta=5,phi=20) ## Using f as value column: use value.var to override. Since the analytic form of the distribution is known, one can apply a trick to correct this problem and then one is sampling from unit normals! parameters { real y_raw; vector[9] x_raw; } transformed parameters { real y; vector[9] x; y <- 3.0 * y_raw; x <- exp(y/2) * x_raw; } model { y_raw ~ normal(0,1); x_raw ~ normal(0,1); } And now the neck of the funnel is explored. funnel_fit <- stan(file='funnel_reparam.stan', cores=4, iter=10000) funnel_samples <- extract(funnel_fit,permuted=TRUE,inc_warmup=FALSE); funnel_df <- data.frame(x1=funnel_samples$x[,1],y=funnel_samples$y[]) df <- funnel_df[funnel_df$x1 < 20 & funnel_df$x1 > -20 & funnel_df$y < 9 & funnel_df$y > -9,] x_c <- cut(df$x1, 20)
y_c <- cut(df$y, 20) z <- table(x_c, y_c) z_df <- as.data.frame(z) a_df <- data.frame(x=midpoints(z_df$x_c),y=midpoints(z_df$y_c),f=z_df$Freq)

m = as.matrix(dcast(a_df,x ~ y))
## Using f as value column: use value.var to override.
hist3D(x=m[,"x"],y=as.double(colnames(m)[2:21]),z=(as.matrix(dcast(a_df,x ~ y)))[,2:21], border="black",ticktype = "detailed",theta=5,phi=20)
## Using f as value column: use value.var to override.

We’d expect the Haskell implementation to also fail to explore the neck. Maybe I will return to this after the article on why NUTS works.

# Calling Haskell from C

As part of improving the random number generation story for Haskell, I want to be able to use the testu01 library with the minimal amount of Haskell wrapping. testu01 assumes that there is a C function which returns the random number. The ghc manual gives an example but does not give all the specifics. These are my notes on how to get the example working under OS X (El Capitain 10.11.5 to be precise).

The Haskell:

{-# OPTIONS_GHC -Wall                 #-}

{-# LANGUAGE ForeignFunctionInterface #-}

module Foo where

foreign export ccall foo :: Int -> IO Int

foo :: Int -> IO Int
foo n = return (length (f n))

f :: Int -> [Int]
f 0 = []
f n = n:(f (n-1))

The .cabal:

name:                test-via-c
version:             0.1.0.0
homepage:            TBD
license:             MIT
author:              Dominic Steinitz
maintainer:          idontgetoutmuch@gmail.com
category:            System
build-type:          Simple
cabal-version:       >=1.10

executable Foo.dylib
main-is: Foo.hs
other-extensions:    ForeignFunctionInterface
build-depends:       base >=4.7 && =0.6 && <0.7
hs-source-dirs:      src
default-language:    Haskell2010
include-dirs:        src
ghc-options:         -O2 -shared -fPIC -dynamic
extra-libraries:     HSrts-ghc8.0.1

On my computer running

cabal install

places the library in

~/Library/Haskell/ghc-8.0.1/lib/test-via-c-0.1.0.0/bin

The C:

#include
#include "HsFFI.h"

#include "../dist/build/Foo.dylib/Foo.dylib-tmp/Foo_stub.h"

int main(int argc, char *argv[])
{
int i;

hs_init(&argc, &argv);

for (i = 0; i < 5; i++) {
printf("%d\n", foo(2500));
}

hs_exit();
return 0;
}

On my computer this can be compiled with

gcc-6 Bar.c
~/Library/Haskell/ghc-8.0.1/lib/test-via-c-0.1.0.0/bin/Foo.dylib
-I/Library/Frameworks/GHC.framework/Versions/8.0.1-x86_64/usr/lib/ghc-8.0.1/include
-L/Library/Frameworks/GHC.framework/Versions/8.0.1-x86_64/usr/lib/ghc-8.0.1/rts
-lHSrts-ghc8.0.1

and can be run with

DYLD_LIBRARY_PATH=
~/Library/Haskell/ghc-8.0.1/lib/test-via-c-0.1.0.0/bin:
/Library/Frameworks/GHC.framework/Versions/8.0.1-x86_64/usr/lib/ghc-8.0.1/rts

N.B. setting DYLD_LIBRARY_PATH like this is not recommended as it is a good way of breaking things. I have tried setting DYLD_FALLBACK_LIBRARY_PATH but only to get an error message. Hopefully, at some point I will be able to post a robust way of getting the executable to pick up the required dynamic libraries.

# Introduction

I was intrigued by a tweet by the UK Chancellor of the Exchequer stating "exports [to South Korea] have doubled over the last year. Now worth nearly £11bn” and a tweet by a Member of the UK Parliament stating South Korea "our second fastest growing trading partner". Although I have never paid much attention to trade statistics, both these statements seemed surprising. But these days it’s easy enough to verify such statements. It’s also an opportunity to use the techniques I believe data scientists in (computer) game companies use to determine how much impact a new feature has on the game’s consumers.

One has to be slightly careful with trade statistics as they come in many different forms, e.g., just goods or goods and services etc. When I provide software and analyses to US organisations, I am included in the services exports from the UK to the US.

Let’s analyse goods first before moving on to goods and services.

# Getting the Data

First let’s get hold of the quarterly data from the UK Office of National Statistics.

ukstats <- "https://www.ons.gov.uk"
bop <- "economy/nationalaccounts/balanceofpayments"
ds <- "datasets/tradeingoodsmretsallbopeu2013timeseriesspreadsheet/current/mret.csv"

mycsv <- read.csv(paste(ukstats,"file?uri=",bop,ds,sep="/"),stringsAsFactors=FALSE)

Now we can find the columns that refer to Korea.

ns <- which(grepl("Korea", names(mycsv)))
length(ns)
## [1] 3
names(mycsv[ns[1]])
## [1] "BoP.consistent..South.Korea..Exports..SA................................"
names(mycsv[ns[2]])
## [1] "BoP.consistent..South.Korea..Imports..SA................................"
names(mycsv[ns[3]])
## [1] "BoP.consistent..South.Korea..Balance..SA................................"

Now we can pull out the relevant information and create a data frame of it.

korean <- mycsv[grepl("Korea", names(mycsv))]
imports <- korean[grepl("Imports", names(korean))]
exports <- korean[grepl("Exports", names(korean))]
balance <- korean[grepl("Balance", names(korean))]

df <- data.frame(mycsv[grepl("Title", names(mycsv))],
imports,
exports,
balance)
colnames(df) <- c("Title", "Imports", "Exports", "Balance")

startQ <- which(grepl("1998 Q1",df$Title)) endQ <- which(grepl("2016 Q3",df$Title))
dfQ <- df[startQ:endQ,]

We can now plot the data.

tab <- data.frame(kr=as.numeric(dfQ$Exports), krLabs=as.numeric(as.Date(as.yearqtr(dfQ$Title,format='%Y Q%q'))))

ggplot(tab, aes(x=as.Date(tab$krLabs), y=tab$kr)) + geom_line() +
theme(legend.position="bottom") +
ggtitle("Goods Exports UK / South Korea (Quarterly)") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Date") +
ylab("Value (£m)")

For good measure let’s plot the annual data.

startY <- grep("^1998$",df$Title)
endY <- grep("^2015$",df$Title)
dfYear <- df[startY:endY,]

tabY <- data.frame(kr=as.numeric(dfYear$Exports), krLabs=as.numeric(dfYear$Title))

ggplot(tabY, aes(x=tabY$krLabs, y=tabY$kr)) + geom_line() +
theme(legend.position="bottom") +
ggtitle("Goods Exports UK / South Korea (Annual)") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Date") +
ylab("Value (£m)")

And the monthly data.

startM <- grep("1998 JAN",df$Title) endM <- grep("2016 OCT",df$Title)
dfMonth <- df[startM:endM,]

tabM <- data.frame(kr=as.numeric(dfMonth$Exports), krLabs=as.numeric(as.Date(as.yearmon(dfMonth$Title,format='%Y %B'))))

ggplot(tabM, aes(x=as.Date(tabM$krLabs), y=tabM$kr)) + geom_line() +
theme(legend.position="bottom") +
ggtitle("Goods Exports UK / South Korea (Monthly)") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Date") +
ylab("Value (£m)")

It looks like some change took place in 2011 but nothing to suggest either that "export have doubled over the last year" or that South Korea is "our second fastest growing partner". That some sort of change did happen is further supported by the fact a Free Trade Agreement between the EU and Korea was put in place in 2011.

But was there really a change? And what sort of change was it? Sometimes it’s easy to imagine patterns where there are none.

With this warning in mind let us see if we can get a better feel from the numbers as to what happened.

# The Model

Let us assume that the data for exports are approximated by a linear function of time but that there is a change in the slope and the offset at some point during observation.

\begin{aligned} \tau &\sim {\mathrm{Uniform}}(1, N) \\ \mu_1 &\sim \mathcal{N}(\mu_{\mu_1}, \sigma_{\mu_1}) \\ \gamma_1 &\sim \mathcal{N}(\mu_{\gamma_1}, \sigma_{\gamma_1}) \\ \sigma_1 &\sim \mathcal{N}(\mu_{\sigma_1}, \sigma_{\sigma_1}) \\ \mu_2 &\sim \mathcal{N}(\mu_{\mu_2}, \sigma_{\mu_2}) \\ \gamma_2 &\sim \mathcal{N}(\mu_{\gamma_2}, \sigma_{\gamma_2}) \\ \sigma_2 &\sim \mathcal{N}(\mu_{\sigma_2}, \sigma_{\sigma_2}) \\ y_i &\sim \begin{cases} \mathcal{N}(\mu_1 x_i + \gamma_1, \sigma_1) & \mbox{if } i < \tau \\ \mathcal{N}(\mu_2 x_i + \gamma_2, \sigma_2), & \mbox{if } i \geq \tau \end{cases} \end{aligned}

Since we are going to use stan to infer the parameters for this model and stan cannot handle discrete parameters, we need to marginalize out this (discrete) parameter. I hope to do the same analysis with LibBi which seems more suited to time series analysis and which I believe will not require such a step.

Setting D = {yi}i = 1N we can calculate the likelihood

\begin{aligned} p(D \,|\, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) &= \sum_{n=1}^N p(\tau, D \,|\, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) \\ &= \sum_{\tau=1}^N p(\tau) p(D \,|\, \tau, \mu_1, \sigma_1, \mu_2, \sigma_2) \\ &=\sum_{\tau=1}^N p(\tau) \prod_{i=1}^N p(y_i \,|\, \tau, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) \end{aligned}

stan operates on the log scale and thus requires the log likelihood

$\log p(D \,|\, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) = \mathrm{log\_sum\_exp}_{\tau=1}^T \big( \log \mathcal{U}(\tau \, | \, 1, T) \\ + \sum_{i=1}^T \log \mathcal{N}(y_i \, | \, \nu_i, \rho_i) \big)$

where

\begin{aligned} \nu_i &= \begin{cases} \mu_1 x_i + \gamma_1 & \mbox{if } i < \tau \\ \mu_2 x_i + \gamma_2 & \mbox{if } i \geq \tau \end{cases} \\ \rho_i &= \begin{cases} \sigma_1 & \mbox{if } i < \tau \\ \sigma_2 & \mbox{if } i \geq \tau \end{cases} \end{aligned}

and where the log sum of exponents function is defined by

$\mathrm{\log\_sum\_exp}_{n=1}^N \, \alpha_n = \log \sum_{n=1}^N \exp(\alpha_n).$

The log sum of exponents function allows the model to be coded directly in Stan using the built-in function , which provides both arithmetic stability and efficiency for mixture model calculations.

## Stan

Here’s the model in stan. Sadly I haven’t found a good way of divvying up .stan files in a .Rmd file so that it still compiles.

data {
int<lower=1> N;
real x[N];
real y[N];
}

parameters {
real mu1;
real mu2;
real gamma1;
real gamma2;

real<lower=0> sigma1;
real<lower=0> sigma2;
}

transformed parameters {
vector[N] log_p;
real mu;
real sigma;
log_p = rep_vector(-log(N), N);
for (tau in 1:N)
for (i in 1:N) {
mu = i < tau ? (mu1 * x[i] + gamma1) : (mu2 * x[i] + gamma2);
sigma = i < tau ? sigma1 : sigma2;
log_p[tau] = log_p[tau] + normal_lpdf(y[i] | mu, sigma);
}
}

model {
mu1 ~ normal(0, 10);
mu2 ~ normal(0, 10);
gamma1 ~ normal(0, 10);
gamma2 ~ normal(0, 10);
sigma1 ~ normal(0, 10);
sigma2 ~ normal(0, 10);

target += log_sum_exp(log_p);
}

generated quantities {
int<lower=1,upper=N> tau;
tau = categorical_rng(softmax(log_p));
}

The above, although mimicking our mathematical model, has quadratic complexity and we can use the trick in the stan manual to make it linear albeit with less clarity.

data {
int<lower=1> N;
real x[N];
real y[N];
}

parameters {
real mu1;
real mu2;
real gamma1;
real gamma2;

real<lower=0> sigma1;
real<lower=0> sigma2;
}

transformed parameters {
vector[N] log_p;
{
vector[N+1] log_p_e;
vector[N+1] log_p_l;
log_p_e[1] = 0;
log_p_l[1] = 0;
for (i in 1:N) {
log_p_e[i + 1] = log_p_e[i] + normal_lpdf(y[i] | mu1 * x[i] + gamma1, sigma1);
log_p_l[i + 1] = log_p_l[i] + normal_lpdf(y[i] | mu2 * x[i] + gamma2, sigma2);
}
log_p = rep_vector(-log(N) + log_p_l[N + 1], N) + head(log_p_e, N) - head(log_p_l, N);
}
}

model {
mu1 ~ normal(0, 10);
mu2 ~ normal(0, 10);
gamma1 ~ normal(0, 10);
gamma2 ~ normal(0, 10);
sigma1 ~ normal(0, 10);
sigma2 ~ normal(0, 10);

target += log_sum_exp(log_p);
}

generated quantities {
int<lower=1,upper=N> tau;
tau = categorical_rng(softmax(log_p));
}

Let’s run this model with the monthly data.

NM <- nrow(tabM)
KM <- ncol(tabM)

yM <- tabM$kr XM <- data.frame(tabM,rep(1,NM))[,2:3] fitM <- stan( file = "lr-changepoint-ng.stan", data = list(x = XM$krLabs, y = yM, N = length(yM)),
chains = 4,
warmup = 1000,
iter = 10000,
cores = 4,
refresh = 500,
seed=42
)
## Warning: There were 662 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems

Looking at the results below we see a multi-modal distribution so a mean is not of much use.

histData <- hist(extract(fitM)$tau,plot=FALSE,breaks=c(seq(1,length(yM),1))) histData$counts
##   [1] 18000     0     0     0     0     0     0     0     0     0     0
##  [12]     0     0     0     0     0     0     0     0     0     0     0
##  [23]     0     0     0     0     0     0     0     0     0     0     0
##  [34]     0     0     0     0     0     0     0     0     0     0     0
##  [45]     0     0     0     0     0     0     0     0     0     0     0
##  [56]     0     0     0     0     0     0     0     0     0     0     0
##  [67]     0     0     0     0     0     0     0     0     0     0     0
##  [78]     0     0     0     0     0     0     0     0     0     0     0
##  [89]     0     0     0     0     0     0     0     0     0     0     0
## [100]     0     0     0     0     0     0     0     0     0     0     0
## [111]     0     0     0     0     0     0     0     1     4    12    16
## [122]    16   107   712  8132     0     0     0     0     0     0     0
## [133]     0     0     0     0     0     0     0     0     0     0     0
## [144]     0     0     0     0     0     0     0     0     0     0     0
## [155]     0     0     0     0     0     0     0     0     0     0    25
## [166]   171  2812     0     0     0     0     0     0     0     0     0
## [177]     0     0     0     0     0     0     0     0     0     0     0
## [188]     0     0     0     0     0     0     0     0     0     0     0
## [199]     0     0     0     0     0     0     0     0     0     0     0
## [210]     0     0     0     0     0     0     0     0     0     0     0
## [221]     0     0     0     0  5992

We can get a pictorial representation of the maxima so that the multi-modality is even clearer.

min_indexes = which(diff(  sign(diff( c(0,histData$counts,0)))) == 2) max_indexes = which(diff( sign(diff( c(0,histData$counts,0)))) == -2)
modeData = data.frame(x=1:length(histData$counts),y=histData$counts)
min_locs = modeData[min_indexes,]
max_locs = modeData[max_indexes,]
plot(modeData$y, type="l") points( min_locs, col="red", pch=19, cex=1 ) points( max_locs, col="green", pch=19, cex=1 ) My interpretation is that the evidence (data) says there is probably no changepoint (a change at the beginning or end is no change) but there might be a change at intermediate data points. We can see something strange (maybe a large single export?) happened at index 125 which translates to 2008MAY. The mode at index 167 which translates to 2011NOV corresponds roughly to the EU / Korea trade agreement. Let us assume that there really was a material difference in trade at this latter point. We can fit a linear regression before this point and one after this point. Here’s the stan data { int<lower=1> N; int<lower=1> K; matrix[N,K] X; vector[N] y; } parameters { vector[K] beta; real<lower=0> sigma; } model { y ~ normal(X * beta, sigma); } And here’s the R to fit the before and after data. We fit the model, pull out the parameters for the regression and pull out the covariates N <- length(yM) M <- max_locs$x[3]

fite <- stan(file = 'LR.stan',
data = list(N = M, K = ncol(XM), y = yM[1:M], X = XM[1:M,]),
pars=c("beta", "sigma"),
chains=3,
cores=3,
iter=3000,
warmup=1000,
refresh=-1)

se <- extract(fite, pars = c("beta", "sigma"), permuted=TRUE)
estCovParamsE <- colMeans(se$beta) fitl <- stan(file = 'LR.stan', data = list(N = N-M, K = ncol(XM), y = yM[(M+1):N], X = XM[(M+1):N,]), pars=c("beta", "sigma"), chains=3, cores=3, iter=3000, warmup=1000, refresh=-1) sl <- extract(fitl, pars = c("beta", "sigma"), permuted=TRUE) estCovParamsL <- colMeans(sl$beta)

Make predictions

linRegPredsE <- data.matrix(XM) %*% estCovParamsE
linRegPredsL <- data.matrix(XM) %*% estCovParamsL
ggplot(tabM, aes(x=as.Date(tabM$krLabs), y=tabM$kr)) +
geom_line(aes(x = as.Date(tabM$krLabs), y = tabM$kr, col = "Actual")) +
geom_line(data=tabM[1:M,], aes(x = as.Date(tabM$krLabs[1:M]), y = linRegPredsE[(1:M),1], col = "Fit (Before FTA)")) + geom_line(data=tabM[(M+1):N,], aes(x = as.Date(tabM$krLabs[(M+1):N]), y = linRegPredsL[((M+1):N),1], col = "Fit (After FTA)")) +
theme(legend.position="bottom") +
ggtitle("Goods Exports UK / South Korea (Monthly)") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Date") +
ylab("Value (£m)")

# An Intermediate Conclusion and Goods and Services (Pink Book)

So we didn’t manage to substantiate either the Chancellor’s claim or the Member of Parliament’s claim.

But it may be that we can if we look at Goods and Services then we might be able to see the numbers resulting in the claims.

pb <- "datasets/pinkbook/current/pb.csv"
pbcsv <- read.csv(paste(ukstats,"file?uri=",bop,pb,sep="/"),stringsAsFactors=FALSE)

This has a lot more information albeit only annually.

pbns <- grep("Korea", names(pbcsv))
length(pbns)
## [1] 21
lapply(pbns,function(x) names(pbcsv[x]))
## [[1]]
## [1] "BoP..Current.Account..Goods...Services..Imports..South.Korea............"
##
## [[2]]
## [1] "BoP..Current.Account..Current.Transfer..Balance..South.Korea............"
##
## [[3]]
## [1] "BoP..Current.Account..Goods...Services..Balance..South.Korea............"
##
## [[4]]
## [1] "IIP..Assets..Total.South.Korea.........................................."
##
## [[5]]
## [1] "Trade.in.Services.replaces.1.A.B....Exports.Credits...South.Korea...nsa."
##
## [[6]]
## [1] "IIP...Liabilities...Total...South.Korea................................."
##
## [[7]]
## [1] "BoP..Total.income..Balance..South.Korea................................."
##
## [[8]]
## [1] "BoP..Total.income..Debits..South.Korea.................................."
##
## [[9]]
## [1] "BoP..Total.income..Credits..South.Korea................................."
##
## [[10]]
## [1] "BoP..Current.account..Balance..South.Korea.............................."
##
## [[11]]
## [1] "BoP..Current.account..Debits..South.Korea..............................."
##
## [[12]]
## [1] "BoP..Current.account..Credits..South.Korea.............................."
##
## [[13]]
## [1] "IIP...Net...Total....South.Korea........................................"
##
## [[14]]
## [1] "Trade.in.Services.replaces.1.A.B....Imports.Debits...South.Korea...nsa.."
##
## [[15]]
## [1] "BoP..Current.Account..Services..Total.Balance..South.Korea.............."
##
## [[16]]
## [1] "Bop.consistent..Balance..NSA..South.Korea..............................."
##
## [[17]]
## [1] "Bop.consistent..Im..NSA..South.Korea...................................."
##
## [[18]]
## [1] "Bop.consistent..Ex..NSA..South.Korea...................................."
##
## [[19]]
## [1] "Current.Transfers...Exports.Credits...South.Korea...nsa................."
##
## [[20]]
## [1] "Current.Transfers...Imports.Debits...South.Korea...nsa.................."
##
## [[21]]
## [1] "BoP..Current.Account..Goods...Services..Exports..South.Korea............"

Let’s just look at exports.

koreanpb <- pbcsv[grepl("Korea", names(pbcsv))]
exportspb <- koreanpb[grepl("Exports", names(koreanpb))]
names(exportspb)
## [1] "Trade.in.Services.replaces.1.A.B....Exports.Credits...South.Korea...nsa."
## [2] "Current.Transfers...Exports.Credits...South.Korea...nsa................."
## [3] "BoP..Current.Account..Goods...Services..Exports..South.Korea............"

The last column gives exports of Goods and Services so let’s draw a chart of it.

pb <- data.frame(pbcsv[grepl("Title", names(pbcsv))],
exportspb[3])
colnames(pb) <- c("Title", "Exports")

startpbY <- which(grepl("1999",pb$Title)) endpbY <- which(grepl("2015",pb$Title))
pbY <- pb[startpbY:endpbY,]

tabpbY <- data.frame(kr=as.numeric(pbY$Exports), krLabs=as.numeric(pbY$Title))

ggplot(tabpbY, aes(x=tabpbY$krLabs, y=tabpbY$kr)) + geom_line() +
theme(legend.position="bottom") +
ggtitle("Goods and Services Exports UK / South Korea (Annual)") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Date") +
ylab("Value (£m)")

No joy here either to any of the claims. Still it’s been an interesting exercise.

# Introduction

In most presentations of Riemannian geometry, e.g. O’Neill (1983) and Wikipedia, the fundamental theorem of Riemannian geometry (“the miracle of Riemannian geometry”) is given: that for any semi-Riemannian manifold there is a unique torsion-free metric connection. I assume partly because of this and partly because the major application of Riemannian geometry is General Relativity, connections with torsion are given little if any attention.

It turns out we are all very familiar with a connection with torsion: the Mercator projection. Some mathematical physics texts, e.g. Nakahara (2003), allude to this but leave the details to the reader. Moreover, this connection respects the metric induced from Euclidean space.

We use SageManifolds to assist with the calculations. We hint at how this might be done more slickly in Haskell.

## A Cartographic Aside

%matplotlib inline
/Applications/SageMath/local/lib/python2.7/site-packages/traitlets/traitlets.py:770: DeprecationWarning: A parent of InlineBackend._config_changed has adopted the new @observe(change) API
clsname, change_or_name), DeprecationWarning)
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
plt.figure(figsize=(8, 8))

ax = plt.axes(projection=cartopy.crs.Mercator())

ax.gridlines()

ax.add_feature(cartopy.feature.LAND)
ax.add_feature(cartopy.feature.COASTLINE)

plt.show()

We can see Greenland looks much broader at the North than in the middle. But if we use a polar projection (below) then we see this is not the case. Why then is the Mercator projection used in preference to e.g. the polar projection or the once controversial Gall-Peters – see here for more on map projections.

plt.figure(figsize=(8, 8))

bx = plt.axes(projection=cartopy.crs.NorthPolarStereo())

bx.set_extent([-180, 180, 90, 50], ccrs.PlateCarree())

bx.gridlines()

bx.add_feature(cartopy.feature.LAND)
bx.add_feature(cartopy.feature.COASTLINE)

plt.show()

## Colophon

This is written as an Jupyter notebook. In theory, it should be possible to run it assuming you have installed at least sage and Haskell. To publish it, I used

jupyter-nbconvert --to markdown Mercator.ipynb
pandoc -s Mercator.md -t markdown+lhs -o Mercator.lhs \
--filter pandoc-citeproc --bibliography DiffGeom.bib
BlogLiteratelyD --wplatex Mercator.lhs > Mercator.html

Not brilliant but good enough.

Some commands to jupyter to display things nicely.

%display latex
viewer3D = 'tachyon'

# Warming Up With SageManifolds

Let us try a simple exercise: finding the connection coefficients of the Levi-Civita connection for the Euclidean metric on $\mathbb{R}^2$ in polar co-ordinates.

Define the manifold.

N = Manifold(2, 'N',r'\mathcal{N}', start_index=1)

Define a chart and frame with Cartesian co-ordinates.

ChartCartesianN.<x,y> = N.chart()
FrameCartesianN = ChartCartesianN.frame()

Define a chart and frame with polar co-ordinates.

ChartPolarN.<r,theta> = N.chart()
FramePolarN = ChartPolarN.frame()

The standard transformation from Cartesian to polar co-ordinates.

cartesianToPolar = ChartCartesianN.transition_map(ChartPolarN, (sqrt(x^2 + y^2), arctan(y/x)))
print(cartesianToPolar)
Change of coordinates from Chart (N, (x, y)) to Chart (N, (r, theta))
print(latex(cartesianToPolar.display()))

$\displaystyle \left\{\begin{array}{lcl} r & = & \sqrt{x^{2} + y^{2}} \\ \theta & = & \arctan\left(\frac{y}{x}\right) \end{array}\right.$

cartesianToPolar.set_inverse(r * cos(theta), r * sin(theta))
Check of the inverse coordinate transformation:
x == x
y == y
r == abs(r)
theta == arctan(sin(theta)/cos(theta))

Now we define the metric to make the manifold Euclidean.

g_e = N.metric('g_e')
g_e[1,1], g_e[2,2] = 1, 1

We can display this in Cartesian co-ordinates.

print(latex(g_e.display(FrameCartesianN)))

$\displaystyle g_e = \mathrm{d} x\otimes \mathrm{d} x+\mathrm{d} y\otimes \mathrm{d} y$

And we can display it in polar co-ordinates

print(latex(g_e.display(FramePolarN)))

$\displaystyle g_e = \mathrm{d} r\otimes \mathrm{d} r + \left( x^{2} + y^{2} \right) \mathrm{d} \theta\otimes \mathrm{d} \theta$

Next let us compute the Levi-Civita connection from this metric.

nab_e = g_e.connection()
print(latex(nab_e))

$\displaystyle \nabla_{g_e}$

If we use Cartesian co-ordinates, we expect that $\Gamma^k_{ij} = 0, \forall i,j,k$. Only non-zero entries get printed.

print(latex(nab_e.display(FrameCartesianN)))

Just to be sure, we can print out all the entries.

print(latex(nab_e[:]))

$\displaystyle \left[\left[\left[0, 0\right], \left[0, 0\right]\right], \left[\left[0, 0\right], \left[0, 0\right]\right]\right]$

In polar co-ordinates, we get

print(latex(nab_e.display(FramePolarN)))

$\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, r } \, \theta \, \theta }^{ \, r \phantom{\, \theta } \phantom{\, \theta } } & = & -\sqrt{x^{2} + y^{2}} \\ \Gamma_{ \phantom{\, \theta } \, r \, \theta }^{ \, \theta \phantom{\, r } \phantom{\, \theta } } & = & \frac{1}{\sqrt{x^{2} + y^{2}}} \\ \Gamma_{ \phantom{\, \theta } \, \theta \, r }^{ \, \theta \phantom{\, \theta } \phantom{\, r } } & = & \frac{1}{\sqrt{x^{2} + y^{2}}} \end{array}$

Which we can rew-rewrite as

\displaystyle \begin{aligned} \Gamma^r_{\theta,\theta} &= -r \\ \Gamma^\theta_{r,\theta} &= 1/r \\ \Gamma^\theta_{\theta,r} &= 1/r \end{aligned}

with all other entries being 0.

# The Sphere

We define a 2 dimensional manifold. We call it the 2-dimensional (unit) sphere but it we are going to remove a meridian to allow us to define the desired connection with torsion on it.

S2 = Manifold(2, 'S^2', latex_name=r'\mathbb{S}^2', start_index=1)
print(latex(S2))

$\displaystyle \mathbb{S}^2$

To start off with we cover the manifold with two charts.

polar.<th,ph> = S2.chart(r'th:(0,pi):\theta ph:(0,2*pi):\phi'); print(latex(polar))

$\displaystyle \left(\mathbb{S}^2,({\theta}, {\phi})\right)$

mercator.<xi,ze> = S2.chart(r'xi:(-oo,oo):\xi ze:(0,2*pi):\zeta'); print(latex(mercator))

$\displaystyle \left(\mathbb{S}^2,({\xi}, {\zeta})\right)$

We can now check that we have two charts.

print(latex(S2.atlas()))

$\displaystyle \left[\left(\mathbb{S}^2,({\theta}, {\phi})\right), \left(\mathbb{S}^2,({\xi}, {\zeta})\right)\right]$

We can then define co-ordinate frames.

epolar = polar.frame(); print(latex(epolar))

$\displaystyle \left(\mathbb{S}^2 ,\left(\frac{\partial}{\partial {\theta} },\frac{\partial}{\partial {\phi} }\right)\right)$

emercator = mercator.frame(); print(latex(emercator))

$\displaystyle \left(\mathbb{S}^2 ,\left(\frac{\partial}{\partial {\xi} },\frac{\partial}{\partial {\zeta} }\right)\right)$

And define a transition map and its inverse from one frame to the other checking that they really are inverses.

xy_to_uv = polar.transition_map(mercator, (log(tan(th/2)), ph))
xy_to_uv.set_inverse(2*arctan(exp(xi)), ze)
Check of the inverse coordinate transformation:
th == 2*arctan(sin(1/2*th)/cos(1/2*th))
ph == ph
xi == xi
ze == ze

We can define the metric which is the pullback of the Euclidean metric on $\mathbb{R}^3$.

g = S2.metric('g')
g[1,1], g[2,2] = 1, (sin(th))^2

And then calculate the Levi-Civita connection defined by it.

nab_g = g.connection()
print(latex(nab_g.display()))

$\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, {\theta} } \, {\phi} \, {\phi} }^{ \, {\theta} \phantom{\, {\phi} } \phantom{\, {\phi} } } & = & -\cos\left({\theta}\right) \sin\left({\theta}\right) \\ \Gamma_{ \phantom{\, {\phi} } \, {\theta} \, {\phi} }^{ \, {\phi} \phantom{\, {\theta} } \phantom{\, {\phi} } } & = & \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \\ \Gamma_{ \phantom{\, {\phi} } \, {\phi} \, {\theta} }^{ \, {\phi} \phantom{\, {\phi} } \phantom{\, {\theta} } } & = & \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}$

We know the geodesics defined by this connection are the great circles.

We can check that this connection respects the metric.

print(latex(nab_g(g).display()))

$\displaystyle \nabla_{g} g = 0$

And that it has no torsion.

print(latex(nab_g.torsion().display()))
0

## A New Connection

Let us now define an orthonormal frame.

ch_basis = S2.automorphism_field()
ch_basis[1,1], ch_basis[2,2] = 1, 1/sin(th)
e = S2.default_frame().new_frame(ch_basis, 'e')
print(latex(e))

$\displaystyle \left(\mathbb{S}^2, \left(e_1,e_2\right)\right)$

We can calculate the dual 1-forms.

dX = S2.coframes()[2] ; print(latex(dX))

$\displaystyle \left(\mathbb{S}^2, \left(e^1,e^2\right)\right)$

print(latex((dX[1], dX[2])))

$\displaystyle \left(e^1, e^2\right)$

print(latex((dX[1][:], dX[2][:])))

$\displaystyle \left(\left[1, 0\right], \left[0, \sin\left({\theta}\right)\right]\right)$

In this case it is trivial to check that the frame and coframe really are orthonormal but we let sage do it anyway.

print(latex(((dX[1](e[1]).expr(), dX[1](e[2]).expr()), (dX[2](e[1]).expr(), dX[2](e[2]).expr()))))

$\displaystyle \left(\left(1, 0\right), \left(0, 1\right)\right)$

Let us define two vectors to be parallel if their angles to a given meridian are the same. For this to be true we must have a connection $\nabla$ with $\nabla e_1 = \nabla e_2 = 0$.

nab = S2.affine_connection('nabla', latex_name=r'\nabla')
nab.add_coef(e)

Displaying the connection only gives the non-zero components.

print(latex(nab.display(e)))

For safety, let us check all the components explicitly.

print(latex(nab[e,:]))

$\displaystyle \left[\left[\left[0, 0\right], \left[0, 0\right]\right], \left[\left[0, 0\right], \left[0, 0\right]\right]\right]$

Of course the components are not non-zero in other frames.

print(latex(nab.display(epolar)))

$\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, {\phi} } \, {\phi} \, {\theta} }^{ \, {\phi} \phantom{\, {\phi} } \phantom{\, {\theta} } } & = & \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}$

print(latex(nab.display(emercator)))

$\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, {\xi} } \, {\xi} \, {\xi} }^{ \, {\xi} \phantom{\, {\xi} } \phantom{\, {\xi} } } & = & 2 \, \cos\left(\frac{1}{2} \, {\theta}\right)^{2} - 1 \\ \Gamma_{ \phantom{\, {\zeta} } \, {\zeta} \, {\xi} }^{ \, {\zeta} \phantom{\, {\zeta} } \phantom{\, {\xi} } } & = & \frac{2 \, \cos\left(\frac{1}{2} \, {\theta}\right) \cos\left({\theta}\right) \sin\left(\frac{1}{2} \, {\theta}\right)}{\sin\left({\theta}\right)} \end{array}$

This connection also respects the metric $g$.

print(latex(nab(g).display()))

$\displaystyle \nabla g = 0$

Thus, since the Levi-Civita connection is unique, it must have torsion.

print(latex(nab.torsion().display(e)))

$\displaystyle \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} e_2\otimes e^1\otimes e^2 -\frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} e_2\otimes e^2\otimes e^1$

The equations for geodesics are

$\displaystyle \ddot{\gamma}^k + \Gamma_{ \phantom{\, {k} } \, {i} \, {j} }^{ \, {k} \phantom{\, {i} } \phantom{\, {j} } }\dot{\gamma}^i\dot{\gamma}^j = 0$

Explicitly for both variables in the polar co-ordinates chart.

\displaystyle \begin{aligned} \ddot{\gamma}^\phi & + \frac{\cos\theta}{\sin\theta}\dot{\gamma}^\phi\dot{\gamma}^\theta &= 0 \\ \ddot{\gamma}^\theta & &= 0 \end{aligned}

We can check that $\gamma^\phi(t) = \alpha\log\tan t/2$ and $\gamma^\theta(t) = t$ are solutions although sage needs a bit of prompting to help it.

t = var('t'); a = var('a')
print(latex(diff(a * log(tan(t/2)),t).simplify_full()))

$\displaystyle \frac{a}{2 \, \cos\left(\frac{1}{2} \, t\right) \sin\left(\frac{1}{2} \, t\right)}$

We can simplify this further by recalling the trignometric identity.

print(latex(sin(2 * t).trig_expand()))

$\displaystyle 2 \, \cos\left(t\right) \sin\left(t\right)$

print(latex(diff (a / sin(t), t)))

$\displaystyle -\frac{a \cos\left(t\right)}{\sin\left(t\right)^{2}}$

In the mercator co-ordinates chart this is

\displaystyle \begin{aligned} \gamma^\xi(t) &= \alpha\log\tan t/2 \\ \gamma^\zeta(t) &= \log\tan t/2 \end{aligned}

In other words: straight lines.

Reparametersing with $s = \alpha\log\tan t/2$ we obtain

\displaystyle \begin{aligned} \gamma^\phi(s) &= s \\ \gamma^\theta(s) &= 2\arctan e^\frac{s}{\alpha} \end{aligned}

Let us draw such a curve.

R.<t> = RealLine() ; print(R)
Real number line R
print(dim(R))
1
c = S2.curve({polar: [2*atan(exp(-t/10)), t]}, (t, -oo, +oo), name='c')
print(latex(c.display()))

$\displaystyle \begin{array}{llcl} c:& \mathbb{R} & \longrightarrow & \mathbb{S}^2 \\ & t & \longmapsto & \left({\theta}, {\phi}\right) = \left(2 \, \arctan\left(e^{\left(-\frac{1}{10} \, t\right)}\right), t\right) \\ & t & \longmapsto & \left({\xi}, {\zeta}\right) = \left(-\frac{1}{10} \, t, t\right) \end{array}$

c.parent()

$\displaystyle \mathrm{Hom}\left(\mathbb{R},\mathbb{S}^2\right)$

c.plot(chart=polar, aspect_ratio=0.1)

It’s not totally clear this is curved so let’s try with another example.

d = S2.curve({polar: [2*atan(exp(-t)), t]}, (t, -oo, +oo), name='d')
print(latex(d.display()))

$\displaystyle \begin{array}{llcl} d:& \mathbb{R} & \longrightarrow & \mathbb{S}^2 \\ & t & \longmapsto & \left({\theta}, {\phi}\right) = \left(2 \, \arctan\left(e^{\left(-t\right)}\right), t\right) \\ & t & \longmapsto & \left({\xi}, {\zeta}\right) = \left(-t, t\right) \end{array}$

d.plot(chart=polar, aspect_ratio=0.2)

Now it’s clear that a straight line is curved in polar co-ordinates.

But of course in Mercator co-ordinates, it is a straight line. This explains its popularity with mariners: if you draw a straight line on your chart and follow that bearing or rhumb line using a compass you will arrive at the end of the straight line. Of course, it is not the shortest path; great circles are but is much easier to navigate.

c.plot(chart=mercator, aspect_ratio=0.1)
d.plot(chart=mercator, aspect_ratio=1.0)

We can draw these curves on the sphere itself not just on its charts.

R3 = Manifold(3, 'R^3', r'\mathbb{R}^3', start_index=1)
cart.<X,Y,Z> = R3.chart(); print(latex(cart))

$\displaystyle \left(\mathbb{R}^3,(X, Y, Z)\right)$

Phi = S2.diff_map(R3, {
(polar, cart): [sin(th) * cos(ph), sin(th) * sin(ph), cos(th)],
(mercator, cart): [cos(ze) / cosh(xi), sin(ze) / cosh(xi),
sinh(xi) / cosh(xi)]
},
name='Phi', latex_name=r'\Phi')

We can either plot using polar co-ordinates.

graph_polar = polar.plot(chart=cart, mapping=Phi, nb_values=25, color='blue')
show(graph_polar, viewer=viewer3D)

Or using Mercator co-ordinates. In either case we get the sphere (minus the prime meridian).

graph_mercator = mercator.plot(chart=cart, mapping=Phi, nb_values=25, color='red')
show(graph_mercator, viewer=viewer3D)

We can plot the curve with an angle to the meridian of $\pi/2 - \arctan 1/10$

graph_c = c.plot(mapping=Phi, max_range=40, plot_points=200, thickness=2)
show(graph_polar + graph_c, viewer=viewer3D)

And we can plot the curve at angle of $\pi/4$ to the meridian.

graph_d = d.plot(mapping=Phi, max_range=40, plot_points=200, thickness=2, color="green")
show(graph_polar + graph_c + graph_d, viewer=viewer3D)

# Haskell

With automatic differentiation and symbolic numbers, symbolic differentiation is straigtforward in Haskell.

> import Data.Number.Symbolic
> import Numeric.AD
>
> x = var "x"
> y = var "y"
>
> test xs = jacobian ((\x -> [x]) . f) xs
>   where
>     f [x, y] = sqrt x^2 + y^2  ghci> test [1, 1] [[0.7071067811865475,0.7071067811865475]] ghci> test [x, y] [[x/(2.0*sqrt (x*x+y*y))+x/(2.0*sqrt (x*x+y*y)),y/(2.0*sqrt (x*x+y*y))+y/(2.0*sqrt (x*x+y*y))]]  Anyone wishing to take on the task of producing a Haskell version of sagemanifolds is advised to look here before embarking on the task. # Appendix A: Conformal Equivalence Agricola and Thier (2004) shows that the geodesics of the Levi-Civita connection of a conformally equivalent metric are the geodesics of a connection with vectorial torsion. Let’s put some but not all the flesh on the bones. The Koszul formula (see e.g. (O’Neill 1983)) characterizes the Levi-Civita connection $\nabla$ \displaystyle \begin{aligned} 2 \langle \nabla_X Y, Z\rangle & = X \langle Y,Z\rangle + Y \langle Z,X\rangle - Z \langle X,Y\rangle \\ &- \langle X,[Y,Z]\rangle + \langle Y,[Z,X]\rangle + \langle Z,[X,Y]\rangle \end{aligned} Being more explicit about the metric, this can be re-written as \displaystyle \begin{aligned} 2 g(\nabla^g_X Y, Z) & = X g(Y,Z) + Y g(Z,X) - Z g(X,Y) \\ &- g(X,[Y,Z]) + g(Y,[Z,X]) + g(Z,[X,Y]) \end{aligned} Let $\nabla^h$ be the Levi-Civita connection for the metric $h = e^{2\sigma}g$ where $\sigma \in C^\infty M$. Following [Gadea2010] and substituting into the Koszul formula and then applying the product rule \displaystyle \begin{aligned} 2 e^{2 \sigma} g(\nabla^h_X Y, Z) & = X e^{2 \sigma} g(Y,Z) + Y e^{2 \sigma} g(Z,X) - Z e^{2 \sigma} g(X,Y) \\ & + e^{2 \sigma} g([X,Y],Z]) - e^{2 \sigma} g([Y,Z],X) + e^{2 \sigma} g([Z,X],Y) \\ & = 2 e^{2\sigma}[g(\nabla^{g}_X Y, Z) + X\sigma g(Y,Z) + Y\sigma g(Z,X) - Z\sigma g(X,Y)] \\ & = 2 e^{2\sigma}[g(\nabla^{g}_X Y + X\sigma Y + Y\sigma X - g(X,Y) \mathrm{grad}\sigma, Z)] \end{aligned} Where as usual the vector field, $\mathrm{grad}\phi$ for $\phi \in C^\infty M$, is defined via $g(\mathrm{grad}\phi, X) = \mathrm{d}\phi(X) = X\phi$. Let’s try an example. nab_tilde = S2.affine_connection('nabla_t', r'\tilde_{\nabla}') f = S2.scalar_field(-ln(sin(th)), name='f') for i in S2.irange(): for j in S2.irange(): for k in S2.irange(): nab_tilde.add_coef()[k,i,j] = \ nab_g(polar.frame()[i])(polar.frame()[j])(polar.coframe()[k]) + \ polar.frame()[i](f) * polar.frame()[j](polar.coframe()[k]) + \ polar.frame()[j](f) * polar.frame()[i](polar.coframe()[k]) + \ g(polar.frame()[i], polar.frame()[j]) * \ polar.frame()[1](polar.coframe()[k]) * cos(th) / sin(th) print(latex(nab_tilde.display())) $\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, {\theta} } \, {\theta} \, {\theta} }^{ \, {\theta} \phantom{\, {\theta} } \phantom{\, {\theta} } } & = & -\frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}$ print(latex(nab_tilde.torsion().display())) 0 g_tilde = exp(2 * f) * g print(latex(g_tilde.parent())) $\displaystyle \mathcal{T}^{(0,2)}\left(\mathbb{S}^2\right)$ print(latex(g_tilde[:])) $\displaystyle \left(\begin{array}{rr} \frac{1}{\sin\left({\theta}\right)^{2}} & 0 \\ 0 & 1 \end{array}\right)$ nab_g_tilde = g_tilde.connection() print(latex(nab_g_tilde.display())) $\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, {\theta} } \, {\theta} \, {\theta} }^{ \, {\theta} \phantom{\, {\theta} } \phantom{\, {\theta} } } & = & -\frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}$ It’s not clear (to me at any rate) what the solutions are to the geodesic equations despite the guarantees of Agricola and Thier (2004). But let’s try a different chart. print(latex(nab_g_tilde[emercator,:])) $\displaystyle \left[\left[\left[0, 0\right], \left[0, 0\right]\right], \left[\left[0, 0\right], \left[0, 0\right]\right]\right]$ In this chart, the geodesics are clearly straight lines as we would hope. # References Agricola, Ilka, and Christian Thier. 2004. “The geodesics of metric connections with vectorial torsion.” Annals of Global Analysis and Geometry 26 (4): 321–32. doi:10.1023/B:AGAG.0000047509.63818.4f. Nakahara, M. 2003. “Geometry, Topology and Physics.” Text 822: 173–204. doi:10.1007/978-3-642-14700-5. O’Neill, B. 1983. Semi-Riemannian Geometry with Applications to Relativity, 103. Pure and Applied Mathematics. Elsevier Science. https://books.google.com.au/books?id=CGk1eRSjFIIC. # Modelling an Ecosystem via Hamiltonian Monte Carlo # Introduction Recall from the previous post that the Hare growth parameter undergoes Brownian motion so that the further into the future we go, the less certain we are about it. In order to ensure that this parameter remains positive, let’s model the log of it to be Brownian motion. \displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 \bigg(1 - \frac{N_1}{K_1}\bigg) - c_1 N_1 N_2 \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & -\rho_2 N_2 \bigg(1 + \frac{N_2}{K_2}\bigg) + c_2 N_1 N_2 \\ \mathrm{d} \rho_1 & = & \rho_1 \sigma_{\rho_1} \mathrm{d}W_t \end{aligned} where the final equation is a stochastic differential equation with $W_t$ being a Wiener process. By Itô we have $\displaystyle \mathrm{d} (\log{\rho_1}) = - \frac{\sigma_{\rho_1}^2}{2} \mathrm{d} t + \sigma_{\rho_1} \mathrm{d}W_t$ Again, we see that the populations become noisier the further into the future we go. # Inference Now let us infer the growth rate using Hamiltonian Monte Carlo and the domain specific probabilistic language Stan (Stan Development Team (2015b), Stan Development Team (2015a), Hoffman and Gelman (2014), Carpenter (2015)). Here’s the model expressed in Stan. functions { real f1 (real a, real k1, real b, real p, real z) { real q; q = a * p * (1 - p / k1) - b * p * z; return q; } real f2 (real d, real k2, real c, real p, real z) { real q; q = -d * z * (1 + z / k2) + c * p * z; return q; } } data { int<lower=1> T; // Number of observations real y[T]; // Observed hares real k1; // Hare carrying capacity real b; // Hare death rate per lynx real d; // Lynx death rate real k2; // Lynx carrying capacity real c; // Lynx birth rate per hare real deltaT; // Time step } parameters { real<lower=0> mu; real<lower=0> sigma; real<lower=0> p0; real<lower=0> z0; real<lower=0> rho0; real w[T]; } transformed parameters { real<lower=0> p[T]; real<lower=0> z[T]; real<lower=0> rho[T]; p[1] = p0; z[1] = z0; rho[1] = rho0; for (t in 1:(T-1)){ p[t+1] = p[t] + deltaT * f1 (exp(rho[t]), k1, b, p[t], z[t]); z[t+1] = z[t] + deltaT * f2 (d, k2, c, p[t], z[t]); rho[t+1] = rho[t] * exp(sigma * sqrt(deltaT) * w[t] - 0.5 * sigma * sigma * deltaT); } } model { mu ~ uniform(0.0,1.0); sigma ~ uniform(0.0, 0.5); p0 ~ lognormal(log(100.0), 0.2); z0 ~ lognormal(log(50.0), 0.1); rho0 ~ normal(log(mu), sigma); w ~ normal(0.0,1.0); for (t in 1:T) { y[t] ~ lognormal(log(p[t]),0.1); } } Let’s look at the posteriors of the hyper-parameters for the Hare growth parameter. Again, the estimate for $\mu$ is pretty decent. For our generated data, $\sigma =0$ and given our observations are quite noisy maybe the estimate for this is not too bad also. # Appendix: The R Driving Code All code including the R below can be downloaded from github. install.packages("devtools") library(devtools) install_github("libbi/RBi",ref="master") install_github("sbfnk/RBi.helpers",ref="master") rm(list = ls(all.names=TRUE)) unlink(".RData") library('RBi') try(detach(package:RBi, unload = TRUE), silent = TRUE) library(RBi, quietly = TRUE) library('RBi.helpers') library('ggplot2', quietly = TRUE) library('gridExtra', quietly = TRUE) endTime <- 50 PP <- bi_model("PP.bi") synthetic_dataset_PP <- bi_generate_dataset(endtime=endTime, model=PP, seed="42", verbose=TRUE, add_options = list( noutputs=500)) rdata_PP <- bi_read(synthetic_dataset_PP) df <- data.frame(rdata_PPP$nr, rdata_PP$P$value, rdata_PP$Z$value, rdata_PP$P_obs$value) ggplot(df, aes(rdata_PP$P$nr, y = Population, color = variable), size = 0.1) + geom_line(aes(y = rdata_PP$P$value, col = "Hare"), size = 0.1) + geom_line(aes(y = rdata_PP$Z$value, col = "Lynx"), size = 0.1) + geom_point(aes(y = rdata_PP$P_obs$value, col = "Observations"), size = 0.1) + theme(legend.position="none") + ggtitle("Example Data") + xlab("Days") + theme(axis.text=element_text(size=4), axis.title=element_text(size=6,face="bold")) + theme(plot.title = element_text(size=10)) ggsave(filename="diagrams/LVdata.png",width=4,height=3) library(rstan) rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectCores()) lvStanModel <- stan_model(file = "SHO.stan",verbose=TRUE) lvFit <- sampling(lvStanModel, seed=42, data=list(T = length(rdata_PP$P_obs$value), y = rdata_PP$P_obs$value, k1 = 2.0e2, b = 2.0e-2, d = 4.0e-1, k2 = 2.0e1, c = 4.0e-3, deltaT = rdata_PP$P_obs$time[2] - rdata_PP$P_obs$time[1] ), chains=1) samples <- extract(lvFit) gs1 <- qplot(x = samples$mu, y = ..density.., geom = "histogram") + xlab(expression(\mu))
gs2 <- qplot(x = samples$sigma, y = ..density.., geom = "histogram") + xlab(expression(samples$sigma))
gs3 <- grid.arrange(gs1, gs2)
ggsave(plot=gs3,filename="diagrams/LvPosteriorStan.png",width=4,height=3)

synthetic_dataset_PP1 <- bi_generate_dataset(endtime=endTime,
model=PP,
init = list(P = 100, Z=50),
seed="42",
verbose=TRUE,
add_options = list(
noutputs=500))

rdata_PP1 <- bi_read(synthetic_dataset_PP1)

synthetic_dataset_PP2 <- bi_generate_dataset(endtime=endTime,
model=PP,
init = list(P = 150, Z=25),
seed="42",
verbose=TRUE,
add_options = list(
noutputs=500))

rdata_PP2 <- bi_read(synthetic_dataset_PP2)

df1 <- data.frame(hare = rdata_PP$P$value,
lynx = rdata_PP$Z$value,
hare1 = rdata_PP1$P$value,
lynx1 = rdata_PP1$Z$value,
hare2 = rdata_PP2$P$value,
lynx2 = rdata_PP2$Z$value)

ggplot(df1) +
geom_path(aes(x=df1$hare, y=df1$lynx, col = "0"), size = 0.1) +
geom_path(aes(x=df1$hare1, y=df1$lynx1, col = "1"), size = 0.1) +
geom_path(aes(x=df1$hare2, y=df1$lynx2, col = "2"), size = 0.1) +
theme(legend.position="none") +
ggtitle("Phase Space") +
xlab("Hare") +
ylab("Lynx") +
theme(axis.text=element_text(size=4),
axis.title=element_text(size=6,face="bold")) +
theme(plot.title = element_text(size=10))
ggsave(filename="diagrams/PPviaLibBi.png",width=4,height=3)

PPInfer <- bi_model("PPInfer.bi")

bi_object_PP <- libbi(client="sample", model=PPInfer, obs = synthetic_dataset_PP)

bi_object_PP$run(add_options = list( "end-time" = endTime, noutputs = endTime, nsamples = 2000, nparticles = 128, seed=42, nthreads = 1), verbose = TRUE, stdoutput_file_name = tempfile(pattern="pmmhoutput", fileext=".txt")) bi_file_summary(bi_object_PP$result$output_file_name) mu <- bi_read(bi_object_PP, "mu")$value
g1 <- qplot(x = mu[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(mu))
sigma <- bi_read(bi_object_PP, "sigma")$value g2 <- qplot(x = sigma[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(sigma)) g3 <- grid.arrange(g1, g2) ggsave(plot=g3,filename="diagrams/LvPosterior.png",width=4,height=3) df2 <- data.frame(hareActs = rdata_PP$P$value, hareObs = rdata_PP$P_obs$value) ggplot(df, aes(rdata_PP$P$nr, y = value, color = variable)) + geom_line(aes(y = rdata_PP$P$value, col = "Phyto")) + geom_line(aes(y = rdata_PP$Z$value, col = "Zoo")) + geom_point(aes(y = rdata_PP$P_obs$value, col = "Phyto Obs")) ln_alpha <- bi_read(bi_object_PP, "ln_alpha")$value

P <- matrix(bi_read(bi_object_PP, "P")$value,nrow=51,byrow=TRUE) Z <- matrix(bi_read(bi_object_PP, "Z")$value,nrow=51,byrow=TRUE)

data50 <- bi_generate_dataset(endtime=endTime,
model=PP,
seed="42",
verbose=TRUE,
add_options = list(
noutputs=50))

rdata50 <- bi_read(data50)

df3 <- data.frame(days = c(1:51), hares = rowMeans(P), lynxes = rowMeans(Z),
actHs = rdata50$P$value, actLs = rdata50$Z$value)

ggplot(df3) +
geom_line(aes(x = days, y = hares, col = "Est Phyto")) +
geom_line(aes(x = days, y = lynxes, col = "Est Zoo")) +
geom_line(aes(x = days, y = actHs, col = "Act Phyto")) +
geom_line(aes(x = days, y = actLs, col = "Act Zoo"))

# Bibliography

Carpenter, Bob. 2015. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software.

Hoffman, Matthew D., and Andrew Gelman. 2014. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research.

Stan Development Team. 2015a. Stan Modeling Language User’s Guide and Reference Manual, Version 2.10.0. http://mc-stan.org/.

———. 2015b. “Stan: A C++ Library for Probability and Sampling, Version 2.10.0.” http://mc-stan.org/.

# Introduction

In the 1920s, Lotka (1909) and Volterra (1926) developed a model of a very simple predator-prey ecosystem.

\displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 - c_1 N_1 N_2 \label{eq2a} \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & c_2 N_1 N_2 - \rho_2 N2 \label{eq2b} \end{aligned}

Although simple, it turns out that the Canadian lynx and showshoe hare are well represented by such a model. Furthermore, the Hudson Bay Company kept records of how many pelts of each species were trapped for almost a century, giving a good proxy of the population of each species.

We can capture the fact that we do not have a complete model by describing our state of ignorance about the parameters. In order to keep this as simple as possible let us assume that log parameters undergo Brownian motion. That is, we know the parameters will jiggle around and the further into the future we look the less certain we are about what values they will have taken. By making the log parameters undergo Brownian motion, we can also capture our modelling assumption that birth, death and predation rates are always positive. A similar approach is taken in Dureau, Kalogeropoulos, and Baguelin (2013) where the (log) parameters of an epidemiological model are taken to be Ornstein-Uhlenbeck processes (which is biologically more plausible although adds to the complexity of the model, something we wish to avoid in an example such as this).

Andrieu, Doucet, and Holenstein (2010) propose a method to estimate the parameters of such models (Particle Marginal Metropolis Hastings aka PMMH) and the domain specific probabilistic language LibBi (Murray (n.d.)) can be used to apply this (and other inference methods).

For the sake of simplicity, in this blog post, we only model one parameter as being unknown and undergoing Brownian motion. A future blog post will consider more sophisticated scenarios.

# A Dynamical System Aside

The above dynamical system is structurally unstable (more on this in a future post), a possible indication that it should not be considered as a good model of predator–prey interaction. Let us modify this to include carrying capacities for the populations of both species.

\displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 \bigg(1 - \frac{N_1}{K_1}\bigg) - c_1 N_1 N_2 \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & -\rho_2 N_2 \bigg(1 + \frac{N_2}{K_2}\bigg) + c_2 N_1 N_2 \end{aligned}

# Data Generation with LibBi

Let’s generate some data using LibBi.

// Generate data assuming a fixed growth rate for hares rather than
// e.g. a growth rate that undergoes Brownian motion.

model PP {
const h         = 0.1;    // time step
const delta_abs = 1.0e-3; // absolute error tolerance
const delta_rel = 1.0e-6; // relative error tolerance

const a  = 5.0e-1 // Hare growth rate
const k1 = 2.0e2  // Hare carrying capacity
const b  = 2.0e-2 // Hare death rate per lynx
const d  = 4.0e-1 // Lynx death rate
const k2 = 2.0e1  // Lynx carrying capacity
const c  = 4.0e-3 // Lynx birth rate per hare

state P, Z       // Hares and lynxes
state ln_alpha   // Hare growth rate - we express it in log form for
// consistency with the inference model
obs P_obs        // Observations of hares

sub initial {
P ~ log_normal(log(100.0), 0.2)
Z ~ log_normal(log(50.0), 0.1)
}

sub transition(delta = h) {
ode(h = h, atoler = delta_abs, rtoler = delta_rel, alg = 'RK4(3)') {
dP/dt =  a * P * (1 - P / k1) - b * P * Z
dZ/dt = -d * Z * (1 + Z / k2) + c * P * Z
}
}

sub observation {
P_obs ~ log_normal(log(P), 0.1)
}
}

We can look at phase space starting with different populations and see they all converge to the same fixed point.

# Data Generation with Haskell

Since at some point in the future, I plan to produce Haskell versions of the methods given in Andrieu, Doucet, and Holenstein (2010), let’s generate the data using Haskell.

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}

> module LotkaVolterra (
>     solLv
>   , solPp
>   , h0
>   , l0
>   , baz
>   , logBM
>   , eulerEx
>   )where

> import Numeric.GSL.ODE
> import Numeric.LinearAlgebra

> import Data.Random.Source.PureMT
> import Data.Random hiding ( gamma )
> import Control.Monad.State


Here’s the unstable model.

> lvOde :: Double ->
>          Double ->
>          Double ->
>          Double ->
>          Double ->
>          [Double] ->
>          [Double]
> lvOde rho1 c1 rho2 c2 _t [h, l] =
>   [
>     rho1 * h - c1 * h * l
>   , c2 * h * l - rho2 * l
>   ]
> lvOde _rho1 _c1 _rho2 _c2 _t vars =
>   error $"lvOde called with: " ++ show (length vars) ++ " variable"  > rho1, c1, rho2, c2 :: Double > rho1 = 0.5 > c1 = 0.02 > rho2 = 0.4 > c2 = 0.004  > deltaT :: Double > deltaT = 0.1  > solLv :: Matrix Double > solLv = odeSolve (lvOde rho1 c1 rho2 c2) > [50.0, 50.0] > (fromList [0.0, deltaT .. 50])  And here’s the stable model. > ppOde :: Double -> > Double -> > Double -> > Double -> > Double -> > Double -> > Double -> > [Double] -> > [Double] > ppOde a k1 b d k2 c _t [p, z] = > [ > a * p * (1 - p / k1) - b * p * z > , -d * z * (1 + z / k2) + c * p * z > ] > ppOde _a _k1 _b _d _k2 _c _t vars = > error$ "ppOde called with: " ++ show (length vars) ++ " variable"

> a, k1, b, d, k2, c :: Double
> a = 0.5
> k1 = 200.0
> b = 0.02
> d = 0.4
> k2 = 50.0
> c = 0.004

> solPp :: Double -> Double -> Matrix Double
> solPp x y = odeSolve (ppOde a k1 b d k2 c)
>                  [x, y]
>                  (fromList [0.0, deltaT .. 50])

> gamma, alpha, beta :: Double
> gamma = d / a
> alpha = a / (c * k1)
> beta  = d / (a * k2)

> fp :: (Double, Double)
> fp = ((gamma + beta) / (1 + alpha * beta), (1 - gamma * alpha) / (1 + alpha * beta))

> h0, l0 :: Double
> h0 = a * fst fp / c
> l0 = a * snd fp / b

> foo, bar :: Matrix R
> foo = matrix 2 [a / k1, b, c, negate d / k2]
> bar = matrix 1 [a, d]

> baz :: Maybe (Matrix R)
> baz = linearSolve foo bar


This gives a stable fixed point of

ghci> baz
Just (2><1)
[ 120.00000000000001
,               10.0 ]


Here’s an example of convergence to that fixed point in phase space.

## The Stochastic Model

Let us now assume that the Hare growth parameter undergoes Brownian motion so that the further into the future we go, the less certain we are about it. In order to ensure that this parameter remains positive, let’s model the log of it to be Brownian motion.

\displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 \bigg(1 - \frac{N_1}{K_1}\bigg) - c_1 N_1 N_2 \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & -\rho_2 N_2 \bigg(1 + \frac{N_2}{K_2}\bigg) + c_2 N_1 N_2 \\ \mathrm{d} \rho_1 & = & \rho_1 \sigma_{\rho_1} \mathrm{d}W_t \end{aligned}

where the final equation is a stochastic differential equation with $W_t$ being a Wiener process.

By Itô we have

$\displaystyle \mathrm{d} (\log{\rho_1}) = - \frac{\sigma_{\rho_1}^2}{2} \mathrm{d} t + \sigma_{\rho_1} \mathrm{d}W_t$

We can use this to generate paths for $\rho_1$.

$\displaystyle \rho_1(t + \Delta t) = \rho_1(t)\exp{\bigg(- \frac{\sigma_{\rho_1}^2}{2} \Delta t + \sigma_{\rho_1} \sqrt{\Delta t} Z\bigg)}$

where $Z \sim {\mathcal{N}}(0,1)$.

> oneStepLogBM :: MonadRandom m => Double -> Double -> Double -> m Double
> oneStepLogBM deltaT sigma rhoPrev = do
>   x <- sample $rvar StdNormal > return$ rhoPrev * exp(sigma * (sqrt deltaT) * x - 0.5 * sigma * sigma * deltaT)

> iterateM :: Monad m => (a -> m a) -> m a -> Int -> m [a]
> iterateM f mx n = sequence . take n . iterate (>>= f) $mx  > logBMM :: MonadRandom m => Double -> Double -> Int -> Int -> m [Double] > logBMM initRho sigma n m = > iterateM (oneStepLogBM (recip$ fromIntegral n) sigma) (return initRho) (n * m)

> logBM :: Double -> Double -> Int -> Int -> Int -> [Double]
> logBM initRho sigma n m seed =
>   evalState (logBMM initRho sigma n m) (pureMT $fromIntegral seed)  We can see the further we go into the future the less certain we are about the value of the parameter. Using this we can simulate the whole dynamical system which is now a stochastic process. > f1, f2 :: Double -> Double -> Double -> > Double -> Double -> > Double > f1 a k1 b p z = a * p * (1 - p / k1) - b * p * z > f2 d k2 c p z = -d * z * (1 + z / k2) + c * p * z  > oneStepEuler :: MonadRandom m => > Double -> > Double -> > Double -> Double -> > Double -> Double -> Double -> > (Double, Double, Double) -> > m (Double, Double, Double) > oneStepEuler deltaT sigma k1 b d k2 c (rho1Prev, pPrev, zPrev) = do > let pNew = pPrev + deltaT * f1 (exp rho1Prev) k1 b pPrev zPrev > let zNew = zPrev + deltaT * f2 d k2 c pPrev zPrev > rho1New <- oneStepLogBM deltaT sigma rho1Prev > return (rho1New, pNew, zNew)  > euler :: MonadRandom m => > (Double, Double, Double) -> > Double -> > Double -> Double -> > Double -> Double -> Double -> > Int -> Int -> > m [(Double, Double, Double)] > euler stateInit sigma k1 b d k2 c n m = > iterateM (oneStepEuler (recip$ fromIntegral n) sigma k1 b d k2 c)
>            (return stateInit)
>            (n * m)

> eulerEx :: (Double, Double, Double) ->
>            Double -> Int -> Int -> Int ->
>            [(Double, Double, Double)]
> eulerEx stateInit sigma n m seed =
>   evalState (euler stateInit sigma k1 b d k2 c n m) (pureMT $fromIntegral seed)  We see that the populations become noisier the further into the future we go. Notice that the second order effects of the system are now to some extent captured by the fact that the growth rate of Hares can drift. In our simulation, this is demonstrated by our decreasing lack of knowledge the further we look into the future. # Inference Now let us infer the growth rate using PMMH. Here’s the model expressed in LibBi. // Infer growth rate for hares model PP { const h = 0.1; // time step const delta_abs = 1.0e-3; // absolute error tolerance const delta_rel = 1.0e-6; // relative error tolerance const a = 5.0e-1 // Hare growth rate - superfluous for inference // but a reminder of what we should expect const k1 = 2.0e2 // Hare carrying capacity const b = 2.0e-2 // Hare death rate per lynx const d = 4.0e-1 // Lynx death rate const k2 = 2.0e1 // Lynx carrying capacity const c = 4.0e-3 // Lynx birth rate per hare state P, Z // Hares and lynxes state ln_alpha // Hare growth rate - we express it in log form for // consistency with the inference model obs P_obs // Observations of hares param mu, sigma // Mean and standard deviation of hare growth rate noise w // Noise sub parameter { mu ~ uniform(0.0, 1.0) sigma ~ uniform(0.0, 0.5) } sub proposal_parameter { mu ~ truncated_gaussian(mu, 0.02, 0.0, 1.0); sigma ~ truncated_gaussian(sigma, 0.01, 0.0, 0.5); } sub initial { P ~ log_normal(log(100.0), 0.2) Z ~ log_normal(log(50.0), 0.1) ln_alpha ~ gaussian(log(mu), sigma) } sub transition(delta = h) { w ~ normal(0.0, sqrt(h)); ode(h = h, atoler = delta_abs, rtoler = delta_rel, alg = 'RK4(3)') { dP/dt = exp(ln_alpha) * P * (1 - P / k1) - b * P * Z dZ/dt = -d * Z * (1 + Z / k2) + c * P * Z dln_alpha/dt = -sigma * sigma / 2 - sigma * w / h } } sub observation { P_obs ~ log_normal(log(P), 0.1) } } Let’s look at the posteriors of the hyper-parameters for the Hare growth parameter. The estimate for $\mu$ is pretty decent. For our generated data, $\sigma =0$ and given our observations are quite noisy maybe the estimate for this is not too bad also. # Appendix: The R Driving Code All code including the R below can be downloaded from github but make sure you use the straight-libbi branch and not master. install.packages("devtools") library(devtools) install_github("sbfnk/RBi",ref="master") install_github("sbfnk/RBi.helpers",ref="master") rm(list = ls(all.names=TRUE)) unlink(".RData") library('RBi') try(detach(package:RBi, unload = TRUE), silent = TRUE) library(RBi, quietly = TRUE) library('RBi.helpers') library('ggplot2', quietly = TRUE) library('gridExtra', quietly = TRUE) endTime <- 50 PP <- bi_model("PP.bi") synthetic_dataset_PP <- bi_generate_dataset(endtime=endTime, model=PP, seed="42", verbose=TRUE, add_options = list( noutputs=500)) rdata_PP <- bi_read(synthetic_dataset_PP) df <- data.frame(rdata_PP$P$nr, rdata_PP$P$value, rdata_PP$Z$value, rdata_PP$P_obs$value) ggplot(df, aes(rdata_PP$P$nr, y = Population, color = variable), size = 0.1) + geom_line(aes(y = rdata_PP$P$value, col = "Hare"), size = 0.1) + geom_line(aes(y = rdata_PP$Z$value, col = "Lynx"), size = 0.1) + geom_point(aes(y = rdata_PP$P_obs$value, col = "Observations"), size = 0.1) + theme(legend.position="none") + ggtitle("Example Data") + xlab("Days") + theme(axis.text=element_text(size=4), axis.title=element_text(size=6,face="bold")) + theme(plot.title = element_text(size=10)) ggsave(filename="diagrams/LVdata.png",width=4,height=3) synthetic_dataset_PP1 <- bi_generate_dataset(endtime=endTime, model=PP, init = list(P = 100, Z=50), seed="42", verbose=TRUE, add_options = list( noutputs=500)) rdata_PP1 <- bi_read(synthetic_dataset_PP1) synthetic_dataset_PP2 <- bi_generate_dataset(endtime=endTime, model=PP, init = list(P = 150, Z=25), seed="42", verbose=TRUE, add_options = list( noutputs=500)) rdata_PP2 <- bi_read(synthetic_dataset_PP2) df1 <- data.frame(hare = rdata_PP$P$value, lynx = rdata_PP$Z$value, hare1 = rdata_PP1$P$value, lynx1 = rdata_PP1$Z$value, hare2 = rdata_PP2$P$value, lynx2 = rdata_PP2$Z$value) ggplot(df1) + geom_path(aes(x=df1$hare,  y=df1$lynx, col = "0"), size = 0.1) + geom_path(aes(x=df1$hare1, y=df1$lynx1, col = "1"), size = 0.1) + geom_path(aes(x=df1$hare2, y=df1$lynx2, col = "2"), size = 0.1) + theme(legend.position="none") + ggtitle("Phase Space") + xlab("Hare") + ylab("Lynx") + theme(axis.text=element_text(size=4), axis.title=element_text(size=6,face="bold")) + theme(plot.title = element_text(size=10)) ggsave(filename="diagrams/PPviaLibBi.png",width=4,height=3) PPInfer <- bi_model("PPInfer.bi") bi_object_PP <- libbi(client="sample", model=PPInfer, obs = synthetic_dataset_PP) bi_object_PP$run(add_options = list(
"end-time" = endTime,
noutputs = endTime,
nsamples = 4000,
nparticles = 128,
seed=42,
nthreads = 1),
## verbose = TRUE,
stdoutput_file_name = tempfile(pattern="pmmhoutput", fileext=".txt"))

bi_file_summary(bi_object_PP$result$output_file_name)

mu <- bi_read(bi_object_PP, "mu")$value g1 <- qplot(x = mu[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(mu)) sigma <- bi_read(bi_object_PP, "sigma")$value
g2 <- qplot(x = sigma[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(sigma))
g3 <- grid.arrange(g1, g2)
ggsave(plot=g3,filename="diagrams/LvPosterior.png",width=4,height=3)

df2 <- data.frame(hareActs = rdata_PP$P$value,
hareObs  = rdata_PP$P_obs$value)

ggplot(df, aes(rdata_PP$P$nr, y = value, color = variable)) +
geom_line(aes(y = rdata_PP$P$value, col = "Phyto")) +
geom_line(aes(y = rdata_PP$Z$value, col = "Zoo")) +
geom_point(aes(y = rdata_PP$P_obs$value, col = "Phyto Obs"))

ln_alpha <- bi_read(bi_object_PP, "ln_alpha")$value P <- matrix(bi_read(bi_object_PP, "P")$value,nrow=51,byrow=TRUE)
Z <- matrix(bi_read(bi_object_PP, "Z")$value,nrow=51,byrow=TRUE) data50 <- bi_generate_dataset(endtime=endTime, model=PP, seed="42", verbose=TRUE, add_options = list( noutputs=50)) rdata50 <- bi_read(data50) df3 <- data.frame(days = c(1:51), hares = rowMeans(P), lynxes = rowMeans(Z), actHs = rdata50$P$value, actLs = rdata50$Zvalue) ggplot(df3) + geom_line(aes(x = days, y = hares, col = "Est Phyto")) + geom_line(aes(x = days, y = lynxes, col = "Est Zoo")) + geom_line(aes(x = days, y = actHs, col = "Act Phyto")) + geom_line(aes(x = days, y = actLs, col = "Act Zoo")) # Bibliography Andrieu, Christophe, Arnaud Doucet, and Roman Holenstein. 2010. “Particle Markov chain Monte Carlo methods.” Journal of the Royal Statistical Society. Series B: Statistical Methodology 72 (3): 269–342. doi:10.1111/j.1467-9868.2009.00736.x. Dureau, Joseph, Konstantinos Kalogeropoulos, and Marc Baguelin. 2013. “Capturing the time-varying drivers of an epidemic using stochastic dynamical systems.” Biostatistics (Oxford, England) 14 (3): 541–55. doi:10.1093/biostatistics/kxs052. Lotka, Alfred J. 1909. “Contribution to the Theory of Periodic Reactions.” The Journal of Physical Chemistry 14 (3): 271–74. doi:10.1021/j150111a004. Murray, Lawrence M. n.d. “Bayesian State-Space Modelling on High-Performance Hardware Using LibBi.” Volterra, Vito. 1926. “Variazioni e fluttuazioni del numero d’individui in specie animali conviventi.” Memorie Della R. Accademia Dei Lincei 6 (2): 31–113. http://www.liberliber.it/biblioteca/v/volterra/variazioni{\_}e{\_}fluttuazioni/pdf/volterra{\_}variazioni{\_}e{\_}fluttuazioni.pdf. # Fun with LibBi and Influenza # Introduction This is a bit different from my usual posts (well apart from my write up of hacking at Odessa) in that it is a log of how I managed to get LibBi (Library for Bayesian Inference) to run on my MacBook and then not totally satisfactorily (as you will see if you read on). The intention is to try a few more approaches to the same problem, for example, Stan, monad-bayes and hand-crafted. Kermack and McKendrick (1927) give a simple model of the spread of an infectious disease. Individuals move from being susceptible ($S$) to infected ($I$) to recovered ($R$). \displaystyle \begin{aligned} \frac{dS}{dt} & = & - \delta S(t) I(t) & \\ \frac{dI}{dt} & = & \delta S(t) I(t) - & \gamma I(t) \\ \frac{dR}{dt} & = & & \gamma I(t) \end{aligned} In 1978, anonymous authors sent a note to the British Medical Journal reporting an influenza outbreak in a boarding school in the north of England (“Influenza in a boarding school” 1978). The chart below shows the solution of the SIR (Susceptible, Infected, Record) model with parameters which give roughly the results observed in the school. # LibBi ## Step 1 ~/LibBi-stable/SIR-master ./init.sh
error: 'ncread' undefined near line 6 column 7

The README says this is optional so we can skip over it. Still it would be nice to fit the bridge weight function as described in Moral and Murray (2015).

The README does say that GPML is required but since we don’t (yet) need to do this step, let’s move on.

~/LibBi-stable/SIR-master $./run.sh ./run.sh Error: ./configure failed with return code 77. See .SIR/build_openmp_cuda_single/configure.log and .SIR/build_openmp_cuda_single/config.log for details It seems the example is configured to run on CUDA and it is highly likely that my installation of LibBI was not set up to allow this. We can change config.conf from --disable-assert --enable-single --enable-cuda --nthreads 2 to --nthreads 4 --enable-sse --disable-assert On to the next issue. ~/LibBi-stable/SIR-master$ ./run.sh
./run.sh
Error: ./configure failed with return code 1. required QRUpdate
library not found. See .SIR/build_sse/configure.log and
.SIR/build_sse/config.log for details

But QRUpdate is installed!

~/LibBi-stable/SIR-master $brew info QRUpdate brew info QRUpdate homebrew/science/qrupdate: stable 1.1.2 (bottled) http://sourceforge.net/projects/qrupdate/ /usr/local/Cellar/qrupdate/1.1.2 (3 files, 302.6K) /usr/local/Cellar/qrupdate/1.1.2_2 (6 files, 336.3K) Poured from bottle /usr/local/Cellar/qrupdate/1.1.2_3 (6 files, 337.3K) * Poured from bottle From: https://github.com/Homebrew/homebrew-science/blob/master/qrupdate.rb ==> Dependencies Required: veclibfort ✔ Optional: openblas ✔ ==> Options --with-openblas Build with openblas support --without-check Skip build-time tests (not recommended) Let’s look in the log as advised. So it seems that a certain symbol cannot be found. checking for dch1dn_ in -lqrupdate Let’s try ourselves. nm -g /usr/local/Cellar/qrupdate/1.1.2_3/lib/libqrupdate.a | grep dch1dn_ 0000000000000000 T _dch1dn_ So the symbol is there! What gives? Let’s try setting one of the environment variables. export LDFLAGS='-L/usr/local/lib' Now we get further. ./run.sh Error: ./configure failed with return code 1. required NetCDF header not found. See .SIR/build_sse/configure.log and .SIR/build_sse/config.log for details So we just need to set another environment variable. export CPPFLAGS='-I/usr/local/include/' This is more mysterious. ./run.sh Error: ./configure failed with return code 1. required Boost header not found. See .SIR/build_sse/configure.log and .SIR/build_sse/config.log for details ~/LibBi-stable/SIR-master Let’s see what we have. brew list | grep -i boost Nothing! I recall having some problems with boost when trying to use a completely different package. So let’s install boost. brew install boost Now we get a different error. ./run.sh Error: make failed with return code 2, see .SIR/build_sse/make.log for details Fortunately at some time in the past sbfnk took pity on me and advised me here to use boost155, a step that should not be lightly undertaken. /usr/local/Cellar/boost155/1.55.0_1: 10,036 files, 451.6M, built in 15 minutes 9 seconds Even then I had to say brew link --force boost155 Finally it runs. ./run.sh 2> out.txt And produces a lot of output wc -l out.txt 49999 out.txt ls -ltrh results/posterior.nc 1.7G Apr 30 19:57 results/posterior.nc Rather worringly, out.txt has all lines of the form 1: -51.9191 -23.2045 nan beats -inf -inf -inf accept=0.5 nan beating -inf does not sound good. Now we are in a position to analyse the results. octave --path oct/ --eval "plot_and_print" error: 'bi_plot_quantiles' undefined near line 23 column 5 I previously found an Octave package(?) called OctBi so let’s create an .octaverc file which adds this to the path. We’ll also need to load the netcdf package which we previously installed. addpath ("../OctBi-stable/inst") pkg load netcdf ~/LibBi-stable/SIR-master$ octave --path oct/ --eval "plot_and_print"
octave --path oct/ --eval "plot_and_print"
warning: division by zero
warning: called from
mean at line 117 column 7
read_hist_simulator at line 47 column 11
bi_read_hist at line 85 column 12
bi_hist at line 63 column 12
plot_and_print at line 56 column 5
warning: division by zero
warning: division by zero
warning: division by zero
warning: division by zero
warning: division by zero
warning: print.m: fig2dev binary is not available.
Some output formats are not available.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
sh: pdfcrop: command not found

I actually get a chart from this so some kind of success.

This does not look like the chart in the Moral and Murray (2015), the fitted number of infected patients looks a lot smoother and the “rates” parameters also vary in a much smoother manner. For reasons I haven’t yet investigated, it looks like over-fitting. Here’s the charts in the paper.

# Bibliography

“Influenza in a boarding school.” 1978. British Medical Journal, March, 587.

Kermack, W. O., and A. G. McKendrick. 1927. “A Contribution to the Mathematical Theory of Epidemics.” Proceedings of the Royal Society of London Series A 115 (August): 700–721. doi:10.1098/rspa.1927.0118.

Moral, Pierre Del, and Lawrence M Murray. 2015. “Sequential Monte Carlo with Highly Informative Observations.”

# Introduction

In their paper Betancourt et al. (2014), the authors give a corollary which starts with the phrase “Because the manifold is paracompact”. It wasn’t immediately clear why the manifold was paracompact or indeed what paracompactness meant although it was clearly something like compactness which means that every cover has a finite sub-cover.

It turns out that every manifold is paracompact and that this is intimately related to partitions of unity.

Most of what I have written below is taken from some hand-written anonymous lecture notes I found by chance in the DPMMS library in Cambridge University. To whomever wrote them: thank you very much.

# Limbering Up

Let $\{U_i : i \in {\mathcal{I}}\}$ be an open cover of a smooth manifold $M$. A partition of unity on M, subordinate to the cover $\{U_i : i \in {\mathcal{I}}\}$ is a finite collection of smooth functions

$\displaystyle X_j : M^n \longrightarrow \mathbb{R}_+$

where $j = 1, 2, \ldots N$ for some $N$ such that

$\displaystyle \sum_{j = 0}^N X_j(x) = 1 \,\mathrm{for all}\, x \in M$

and for each $j$ there exists $i(j) \in {\mathcal{I}}$ such that

$\displaystyle {\mathrm{supp}}{X_j} \subset U_{i(j)}$

We don’t yet know partitions of unity exist.

First define

$\displaystyle f(t) \triangleq \begin{cases} 0 & \text{if } t \leq 0 \\ \exp{(-1/t)} & \text{if } t > 0 \\ \end{cases}$

Techniques of classical analysis easily show that $f$ is smooth ($t=0$ is the only point that might be in doubt and it can be checked from first principles that $f^{(n)}(0) = 0$ for all $n$).

Next define

\displaystyle \begin{aligned} g(t) &\triangleq \frac{f(t)}{f(t) + f(1 - t)} \\ h(t) &\triangleq g(t + 2)g(2 - t) \end{aligned}

Finally we can define $F: \mathbb{R}^n \rightarrow \mathbb{R}$ by $F(x) = h(\|x\|)$. This has the properties

• $F(x) = 1$ if $\|x\| \leq 1$
• $0 \leq F(x) \leq 1$
• $F(x) = 0$ if $\|x\| > 2$

Now take a point $p \in M$ centred in a chart $(U_p, \phi_p)$ so that, without loss of generality, $B(0,3) \subseteq \phi_p(U_p)$ (we can always choose $r_p$ so that the open ball $B(0,3r_p) \subseteq \phi'_p(U_p)$ and then define another chart $(U_p, \phi_p)$ with $\phi_p(x) = \phi'_p(x)/\|x\|$).

Define the images of the open and closed balls of radius $1$ and $2$ respectively

\displaystyle \begin{aligned} V_p &= \phi_p^{-1}(B(0,1)) \\ W_p &= \phi_p^{-1}\big(\overline{B(0,2)}\big) \\ \end{aligned}

and further define bump functions

$\displaystyle \psi_p(y) \triangleq \begin{cases} F(\phi_p(y)) & \text{if } y \in U_p\\ 0 & \text{otherwise} \\ \end{cases}$

Then $\psi_p$ is smooth and its support lies in $W_p \subset U_p$.

By compactness, the open cover $\{V_p : p \in M\}$ has a finite subcover $\{V_{p_1},\ldots,V_{p_K}\}$. Now define

$\displaystyle X_j : M^n \longrightarrow \mathbb{R}_+$

by

$\displaystyle X_j(y) = \frac{\psi_{p_j}(y)}{\sum_{i=1}^K \psi_{p_i}(y)}$

Then $X_j$ is smooth, ${\mathrm{supp}}{X_j} = {\mathrm{supp}}{\psi_{p_j}} \subset U_{p_j}$ and $\sum_{j=1}^K X_j(y) = 1$. Thus $\{X_j\}$ is the required partition of unity.

# Paracompactness

Because $M$ is a manifold, it has a countable basis $\{A_i\}_{i \in \mathbb{N}}$ and for any point $p$, there must exist $A_i \subset V_p$ with $p \in A_i$. Choose one of these and call it $V_{p_i}$. This gives a countable cover of $M$ by such sets.

Now define

$\displaystyle L_1 = W_{p_1} \subset V_{p_1} \cup V_{p_2} \cup \ldots \cup V_{p_{i(2)}}$

where, since $L_1$ is compact, $V_{p_2}, \ldots, V_{p_{i(2)}}$ is a finite subcover.

And further define

$\displaystyle L_n = W_{p_1} \cup W_{p_2} \cup \ldots \cup W_{p_{i(n)}} \subset V_{p_1} \cup V_{p_2} \cup \ldots \cup V_{p_{i(n+1)}}$

where again, since $L_n$ is compact, $V_{p_{i(n)+1}}, \ldots, V_{p_{i(n+1)}}$ is a finite subcover.

Now define

\displaystyle \begin{aligned} K_n &= L_n \setminus {\mathrm{int}}{L_{n-1}} \\ U_n &= {\mathrm{int}}(L_{n+1}) \setminus L_n \end{aligned}

Then $K_n$ is compact, $U_n$ is open and $K_n \subset U_n$. Furthermore, $\bigcup_{n \in \mathbb{N}} K_n = M$ and $U_n$ only intersects with $U_{n-1}$ and $U_{n+1}$.

Given any open cover ${\mathcal{O}}$ of $M$, each $K_n$ can be covered by a finite number of open sets in $U_n$ contained in some member of ${\mathcal{O}}$. Thus every point in $K_n$ can be covered by at most a finite number of sets from $U_{n-1}, U_n$ and $U_{n+1}$ and which are contained in some member of ${\mathcal{O}}$. This is a locally finite refinement of ${\mathcal{O}}$ and which is precisely the definition of paracompactness.

To produce a partition of unity we define bump functions $\psi_j$ as above on this locally finite cover and note that locally finite implies that $\sum_j \psi_j$ is well defined. Again, as above, define

$\displaystyle X_j(y) = \frac{\psi_{j}(y)}{\sum_{i=1} \psi_{i}(y)}$

to get the required result.

# Bibliography

Betancourt, M. J., Simon Byrne, Samuel Livingstone, and Mark Girolami. 2014. “The Geometric Foundations of Hamiltonian Monte Carlo,” October, 45. http://arxiv.org/abs/1410.5110.

# Introduction

In proposition 58 Chapter 1 in the excellent book O’Neill (1983), the author demonstrates that the Lie derivative of one vector field with respect to another is the same as the Lie bracket (of the two vector fields) although he calls the Lie bracket just bracket and does not define the Lie derivative preferring just to use its definition with giving it a name. The proof relies on a prior result where he shows a co-ordinate system at a point $p$ can be given to a vector field $X$ for which $X_p \neq 0$ so that $X = \frac{\partial}{\partial x_1}$.

Here’s a proof seems clearer (to me at any rate) and avoids having to distinguish the case wehere the vector field is zero or non-zero. These notes give a similar proof but, strangely for undergraduate level, elide some of the details.

# A Few Definitions

Let $\phi: M \longrightarrow N$ be a smooth mapping and let $A$ be a $0,s$ tensor with $s \geq 0$ then define the pullback of $A$ by $\phi$ to be

$\displaystyle \phi^*A(v_1,\ldots,v_s) = A(\mathrm{d}\phi v_1,\ldots,\mathrm{d}\phi v_s)$

For a $(0,0)$ tensor $f \in {\mathscr{F}}(N)$ the pullback is defined to be $\phi^*(f) = f \circ \phi \in {\mathscr{F}}(M)$.

Standard manipulations show that $\phi^*A$ is a smooth (covariant) tensor field and that $\phi^*$ is $\mathbb{R}$-linear and that $\phi^*(A\otimes B) = \phi^*A \otimes \phi^*B$.

Let $F : M \longrightarrow N$ be a diffeomorphism and $Y$ a vector field on $N$ we define the pullback of this field to be

$\displaystyle (F^*{Y})_x = D(F^{-1})_{F(x)}(Y_{F(x)})$

Note that the pullback of a vector field only exists in the case where $F$ is a diffeomorphism; in contradistinction, in the case of pullbacks of purely covariant tensors, the pullback always exists.

For the proof below, we only need the pullback of functions and vector fields; the pullback for $(0,s)$ tensors with $s \geq 1$ is purely to give a bit of context.

From O’Neill (1983) Chapter 1 Definition 20, let $\phi: M \rightarrow N$ be a smooth mapping. Vector fields $X$ on $M$ and $Y$ on $N$ are $F$related written $X \underset{F}{\sim} Y$ if and only if $dF({X}_p) = Y_{Fp}$.

# The Alternative Proof

By Lemma 21 Chapter 1 of O’Neill (1983), $X$ and $Y$ are $F$-related if and only if $X(f \circ F) = Yf \circ F$.

Recalling that $dF(X_p)(f) = X_p(F \circ f)$ and since

$\displaystyle dF_x d(F^{-1})_{Fx}(X_{Fx}) = X_{Fx}$

we see that the fields $F^*{Y}$ and $Y$ are $F$-related: $F^*{Y}_x \underset{F}{\sim} Y_{Fx}$. Thus we can apply the Lemma.

$\displaystyle (F^*{Y})(f \circ F) = (F^*{Y})(F^*{f}) = Yf \circ F = F^*(Yf)$

Although we don’t need this, we can express the immediately above equivalence in a way similar to the rule for covariant tensors

$\displaystyle (F^*{Y})(f \otimes F) = (F^*{Y})\otimes(F^*{f})$

First let’s calculate the Lie derivative of a function $f$ with respect to a vector field $X$ where $\phi_t$ is its flow

\displaystyle \begin{aligned} L_X f &\triangleq \lim_{t \rightarrow 0} \frac{\phi_t^*(f) - f}{t} \\ &= \lim_{t \rightarrow 0} \frac{f \circ \phi_t - f \circ \phi_0}{t} \\ &= \lim_{t \rightarrow 0} \frac{f \circ \phi (t,x) - f \circ \phi (0, x)}{t} \\ &= (\phi_x)'_0(f) \\ &= X_x(f) \\ &= (Xf)_x \end{aligned}

Analogously defining the Lie derivative of $Y$ with respect to $X$

$\displaystyle (L_X Y) \triangleq \frac{(\phi_t^*{Y}) - Y}{t}f$

we have

\displaystyle \begin{aligned} L_X(Yf) &= \lim_{t \rightarrow 0} \frac{\phi_t^*(Yf) - Yf}{t} \\ &= \lim_{t \rightarrow 0} \frac{(\phi_t^*{Y})(\phi_t^*{f}) - Yf}{t} \\ &= \lim_{t \rightarrow 0} \frac{(\phi_t^*{Y})(\phi_t^*{f}) - (\phi_t^*{Y})f + (\phi_t^*{Y})f - Yf}{t} \\ &= \lim_{t \rightarrow 0} \Bigg( (\phi_t^*{Y})\frac{\phi_t^*{f} - f}{t} + \frac{(\phi_t^*{Y}) - Y}{t}f \Bigg) \\ &= Y(L_X f) + (L_X Y)f \end{aligned}

Since $L_X f = Xf$ we have

$\displaystyle X(Yf) = Y(Xf) + (L_X Y)f$

Thus

$\displaystyle (L_X Y)f = Y(Xf) - X(Yf) = [X,Y]f$

as required.

# Bibliography

O’Neill, B. 1983. Semi-Riemannian Geometry with Applications to Relativity, 103. Pure and Applied Mathematics. Elsevier Science. https://books.google.com.au/books?id=CGk1eRSjFIIC.

# Introduction

The equation of motion for a pendulum of unit length subject to Gaussian white noise is

$\displaystyle \frac{\mathrm{d}^2\alpha}{\mathrm{d}t^2} = -g\sin\alpha + w(t)$

We can discretize this via the usual Euler method

$\displaystyle \begin{bmatrix} x_{1,i} \\ x_{2,i} \end{bmatrix} = \begin{bmatrix} x_{1,i-1} + x_{2,i-1}\Delta t \\ x_{2,i-1} - g\sin x_{1,i-1}\Delta t \end{bmatrix} + \mathbf{q}_{i-1}$

where $q_i \sim {\mathcal{N}}(0,Q)$ and

$\displaystyle Q = \begin{bmatrix} \frac{q^c \Delta t^3}{3} & \frac{q^c \Delta t^2}{2} \\ \frac{q^c \Delta t^2}{2} & {q^c \Delta t} \end{bmatrix}$

The explanation of the precise form of the covariance matrix will be the subject of another blog post; for the purpose of exposition of forward filtering / backward smoothing, this detail is not important.

Assume that we can only measure the horizontal position of the pendulum and further that this measurement is subject to error so that

$\displaystyle y_i = \sin x_i + r_k$

where $r_i \sim {\mathcal{N}}(0,R)$.

Particle Filtering can give us an estimate of where the pendulum is and its velocity using all the observations up to that point in time. But now suppose we have observed the pendulum for a fixed period of time then at times earlier than the time at which we stop our observations we now have observations in the future as well as in the past. If we can somehow take account of these future observations then we should be able to improve our estimate of where the pendulum was at any given point in time (as well as its velocity). Forward Filtering / Backward Smoothing is a technique for doing this.

## Haskell Preamble

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}

> {-# LANGUAGE MultiParamTypeClasses        #-}
> {-# LANGUAGE TypeFamilies                 #-}
> {-# LANGUAGE ScopedTypeVariables          #-}
> {-# LANGUAGE ExplicitForAll               #-}
> {-# LANGUAGE DataKinds                    #-}

> {-# LANGUAGE FlexibleInstances            #-}
> {-# LANGUAGE MultiParamTypeClasses        #-}
> {-# LANGUAGE FlexibleContexts             #-}
> {-# LANGUAGE TypeFamilies                 #-}
> {-# LANGUAGE BangPatterns                 #-}
> {-# LANGUAGE GeneralizedNewtypeDeriving   #-}
> {-# LANGUAGE TemplateHaskell              #-}
> {-# LANGUAGE DataKinds                    #-}
> {-# LANGUAGE DeriveGeneric                #-}

> module PendulumSamples ( pendulumSamples
>                        , pendulumSamples'
>                        , testFiltering
>                        , testSmoothing
>                        , testFilteringG
>                        , testSmoothingG
>                        ) where

> import           Data.Random hiding ( StdNormal, Normal )
> import           Data.Random.Source.PureMT ( pureMT )
> import           Control.Monad.State ( evalState, replicateM )
> import qualified Control.Monad.Loops as ML
> import           Control.Monad.Writer ( tell, WriterT, lift,
>                                         runWriterT
>                                       )
> import           Numeric.LinearAlgebra.Static
>                  ( R, vector, Sym,
>                    headTail, matrix, sym,
>                    diag
>                  )
> import           GHC.TypeLits ( KnownNat )
> import           MultivariateNormal ( MultivariateNormal(..) )
> import qualified Data.Vector as V
> import           Data.Bits ( shiftR )
> import           Data.List ( transpose )
> import           Control.Parallel.Strategies
> import           GHC.Generics (Generic)


# Simulation

Let’s first plot some typical trajectories of the pendulum.

> deltaT, g :: Double
> deltaT = 0.01
> g  = 9.81

> type PendulumState = R 2
> type PendulumObs = R 1

> pendulumSample :: MonadRandom m =>
>                   Sym 2 ->
>                   Sym 1 ->
>                   PendulumState ->
>                   m (Maybe ((PendulumState, PendulumObs), PendulumState))
> pendulumSample bigQ bigR xPrev = do
>   let x1Prev = fst $headTail xPrev > x2Prev = fst$ headTail $snd$ headTail xPrev
>   eta <- sample $rvar (MultivariateNormal 0.0 bigQ) > let x1= x1Prev + x2Prev * deltaT > x2 = x2Prev - g * (sin x1Prev) * deltaT > xNew = vector [x1, x2] + eta > x1New = fst$ headTail xNew
>   epsilon <-  sample $rvar (MultivariateNormal 0.0 bigR) > let yNew = vector [sin x1New] + epsilon > return$ Just ((xNew, yNew), xNew)


Let’s try plotting some samples when we are in the linear region with which we are familiar from school $\sin\alpha \approx \alpha$.

$\displaystyle \frac{\mathrm{d}^2\alpha}{\mathrm{d}t^2} = -g\alpha + w(t)$

In this case we expect the horizontal displacement to be approximately equal to the angle of displacement and thus the observations to be symmetric about the actuals.

> bigQ :: Sym 2
> bigQ = sym $matrix bigQl  > qc1 :: Double > qc1 = 0.0001  > bigQl :: [Double] > bigQl = [ qc1 * deltaT^3 / 3, qc1 * deltaT^2 / 2, > qc1 * deltaT^2 / 2, qc1 * deltaT > ]  > bigR :: Sym 1 > bigR = sym$ matrix [0.0001]

> m0 :: PendulumState
> m0 = vector [0.01, 0]

> pendulumSamples :: [(PendulumState, PendulumObs)]
> pendulumSamples = evalState (ML.unfoldrM (pendulumSample bigQ bigR) m0) (pureMT 17)


But if we work in a region in which linearity breaks down then the observations are no longer symmetrical about the actuals.

> bigQ' :: Sym 2
> bigQ' = sym $matrix bigQl'  > qc1' :: Double > qc1' = 0.01  > bigQl' :: [Double] > bigQl' = [ qc1' * deltaT^3 / 3, qc1' * deltaT^2 / 2, > qc1' * deltaT^2 / 2, qc1' * deltaT > ]  > bigR' :: Sym 1 > bigR' = sym$ matrix [0.1]

> m0' :: PendulumState
> m0' = vector [1.6, 0]

> pendulumSamples' :: [(PendulumState, PendulumObs)]
> pendulumSamples' = evalState (ML.unfoldrM (pendulumSample bigQ' bigR') m0') (pureMT 17)


# Filtering

We do not give the theory behind particle filtering. The interested reader can either consult Särkkä (2013) or wait for a future blog post on the subject.

> nParticles :: Int
> nParticles = 500


The usual Bayesian update step.

> type Particles a = V.Vector a

> oneFilteringStep ::
>   MonadRandom m =>
>   (Particles a -> m (Particles a)) ->
>   (Particles a -> Particles b) ->
>   (b -> b -> Double) ->
>   Particles a ->
>   b ->
>   WriterT [Particles a] m (Particles a)
> oneFilteringStep stateUpdate obsUpdate weight statePrevs obs = do
>   tell [statePrevs]
>   stateNews <- lift $stateUpdate statePrevs > let obsNews = obsUpdate stateNews > let weights = V.map (weight obs) obsNews > cumSumWeights = V.tail$ V.scanl (+) 0 weights
>       totWeight     = V.last cumSumWeights
>   vs <- lift $V.replicateM nParticles (sample$ uniform 0.0 totWeight)
>   let js = indices cumSumWeights vs
>       stateTildes = V.map (stateNews V.!) js
>   return stateTildes


The system state and observable.

> data SystemState a = SystemState { x1  :: a, x2  :: a }
>   deriving (Show, Generic)

> instance NFData a => NFData (SystemState a)

> newtype SystemObs a = SystemObs { y1  :: a }
>   deriving Show


To make the system state update a bit more readable, let’s introduce some lifted arithmetic operators.

> (.+), (.*), (.-) :: (Num a) => V.Vector a -> V.Vector a -> V.Vector a
> (.+) = V.zipWith (+)
> (.*) = V.zipWith (*)
> (.-) = V.zipWith (-)


The state update itself

> stateUpdate :: Particles (SystemState Double) ->
>                 Particles (SystemState Double)
> stateUpdate xPrevs = V.zipWith SystemState x1s x2s
>   where
>     ix = V.length xPrevs
>
>     x1Prevs = V.map x1 xPrevs
>     x2Prevs = V.map x2 xPrevs
>
>     deltaTs = V.replicate ix deltaT
>     gs = V.replicate ix g
>     x1s = x1Prevs .+ (x2Prevs .* deltaTs)
>     x2s = x2Prevs .- (gs .* (V.map sin x1Prevs) .* deltaTs)


and its noisy version.

> stateUpdateNoisy :: MonadRandom m =>
>                     Sym 2 ->
>                     Particles (SystemState Double) ->
>                     m (Particles (SystemState Double))
> stateUpdateNoisy bigQ xPrevs = do
>   let xs = stateUpdate xPrevs
>
>       x1s = V.map x1 xs
>       x2s = V.map x2 xs
>
>   let ix = V.length xPrevs
>   etas <- replicateM ix $sample$ rvar (MultivariateNormal 0.0 bigQ)
>
>   let eta1s, eta2s :: V.Vector Double
>       eta1s = V.fromList $map (fst . headTail) etas > eta2s = V.fromList$ map (fst . headTail . snd . headTail) etas
>
>   return (V.zipWith SystemState (x1s .+ eta1s) (x2s .+ eta2s))


The function which maps the state to the observable.

> obsUpdate :: Particles (SystemState Double) ->
>              Particles (SystemObs Double)
> obsUpdate xs = V.map (SystemObs . sin . x1) xs


And finally a function to calculate the weight of each particle given an observation.

> weight :: forall a n . KnownNat n =>
>           (a -> R n) ->
>           Sym n ->
>           a -> a -> Double
> weight f bigR obs obsNew = pdf (MultivariateNormal (f obsNew) bigR) (f obs)


The variance of the prior.

> bigP :: Sym 2
> bigP = sym $diag 0.1  Generate our ensemble of particles chosen from the prior, > initParticles :: MonadRandom m => > m (Particles (SystemState Double)) > initParticles = V.replicateM nParticles$ do
>   r <- sample $rvar (MultivariateNormal m0' bigP) > let x1 = fst$ headTail r
>       x2 = fst $headTail$ snd $headTail r > return$ SystemState { x1 = x1, x2 = x2}


run the particle filter,

> runFilter :: Int -> [Particles (SystemState Double)]
> runFilter nTimeSteps = snd $evalState action (pureMT 19) > where > action = runWriterT$ do
>       xs <- lift $initParticles > V.foldM > (oneFilteringStep (stateUpdateNoisy bigQ') obsUpdate (weight f bigR')) > xs > (V.fromList$ map (SystemObs . fst . headTail . snd)
>                           (take nTimeSteps pendulumSamples'))


and extract the estimated position from the filter.

> testFiltering :: Int -> [Double]
> testFiltering nTimeSteps = map ((/ (fromIntegral nParticles)). sum . V.map x1)
>                                (runFilter nTimeSteps)


# Smoothing

If we could calculate the marginal smoothing distributions $\{p(x_t \,|\, y_{1:T})\}_{i=1}^T$ then we might be able to sample from them. Using the Markov assumption of our model that $x_i$ is independent of $y_{i+1:N}$ given $x_{i+1}$, we have

\displaystyle \begin{aligned} \overbrace{p(x_i \,|\, y_{1:N})}^{\mathrm{smoother}\,\mathrm{at}\, i} &= \int p(x_i, x_{i+1} \,|\, y_{1:N}) \,\mathrm{d}x_{i+1} & \text{marginal distribution} \\ &= \int p(x_{i+1} \,|\, y_{1:N}) \,p(x_{i} \,|\, y_{1:N}, x_{i+1}) \,\mathrm{d}x_{i+1} & \text{conditional density} \\ &= \int p(x_{i+1} \,|\, y_{1:N}) \,p(x_{i} \,|\, y_{1:i}, x_{i+1}) \,\mathrm{d}x_{i+1} & \text{Markov model} \\ &= \int p(x_{i+1} \,|\, y_{1:N}) \, \frac{p(x_{i}, x_{i+1} \,|\, y_{1:i})}{p(x_{i+1} \,|\, y_{1:i})} \,\mathrm{d}x_{i+1} & \text{conditional density} \\ &= \int p(x_{i+1} \,|\, y_{1:N}) \, \frac{p(x_{i+1} \,|\, x_{i}, y_{1:i}) \,p(x_{i} \,|\, y_{1:i})}{p(x_{i+1} \,|\, y_{1:i})} \,\mathrm{d}x_{i+1} & \text{conditional density} \\ &= \int \overbrace{p(x_{i+1} \,|\, y_{1:N})}^{\text{smoother at }i+1} \, \underbrace{ \overbrace{p(x_{i} \,|\, y_{1:i})}^{\text{filter at }i} \frac{p(x_{i+1} \,|\, x_{i})} {p(x_{i+1} \,|\, y_{1:i})} } _{\text{backward transition }p(x_{i} \,|\, y_{1:i},\,x_{i+1})} \,\mathrm{d}x_{i+1} & \text{Markov model} \end{aligned}

We observe that this is a (continuous state space) Markov process with a non-homogeneous transition function albeit one which goes backwards in time. Apparently for conditionally Gaussian linear state-space models, this is known as RTS, or Rauch-Tung-Striebel smoothing (Rauch, Striebel, and Tung (1965)).

According to Cappé (2008),

• It appears to be efficient and stable in the long term (although no proof was available at the time the slides were presented).

• It is not sequential (in particular, one needs to store all particle positions and weights).

• It has numerical complexity proportional $O(n^2)$ where $N$ is the number of particles.

We can use this to sample paths starting at time $i = N$ and working backwards. From above we have

$\displaystyle p(x_i \,|\, X_{i+1}, Y_{1:N}) = \frac{p(X_{i+1} \,|\, x_{i}) \,p(x_{i} \,|\, Y_{1:i})}{p(X_{i+1} \,|\, Y_{1:i})} = Z \,p(X_{i+1} \,|\, x_{i}) \,p(x_{i} \,|\, Y_{1:i})$

where $Z$ is some normalisation constant (Z for “Zustandssumme” – sum over states).

From particle filtering we know that

$\displaystyle {p}(x_i \,|\, y_{1:i}) \approx \hat{p}(x_i \,|\, y_{1:i}) \triangleq \sum_{j=1}^M w_i^{(j)}\delta(x_i - x_i^{(j)})$

Thus

$\displaystyle \hat{p}(x_i \,|\, X_{i+1}, Y_{1:i}) = Z \,p(X_{i+1} \,|\, x_{i}) \,\hat{p}(x_{i} \,|\, Y_{1:i}) = \sum_{j=1}^M w_i^{(j)}\delta(x_i - x_i^{(j)}) \,p(X_{i+1} \,|\, x_{i})$

and we can sample $x_i$ from $\{x_i^{(j)}\}_{1 \leq j \leq M}$ with probability

$\displaystyle \frac{w_k^{(i)} \,p(X_{i+1} \,|\, x_{i})} {\sum_{i=1}^N w_k^{(i)} \,p(X_{i+1} \,|\, x_{i})}$

Recalling that we have re-sampled the particles in the filtering algorithm so that their weights are all $1/M$ and abstracting the state update and state density function, we can encode this update step in Haskell as

> oneSmoothingStep :: MonadRandom m =>
>          (Particles a -> V.Vector a) ->
>          (a -> a -> Double) ->
>          a ->
>          Particles a ->
>          WriterT (Particles a) m a
> oneSmoothingStep stateUpdate
>                  stateDensity
>                  smoothingSampleAtiPlus1
>                  filterSamplesAti = do it
>   where
>     it = do
>       let mus = stateUpdate filterSamplesAti
>           weights =  V.map (stateDensity smoothingSampleAtiPlus1) mus
>           cumSumWeights = V.tail $V.scanl (+) 0 weights > totWeight = V.last cumSumWeights > v <- lift$ sample $uniform 0.0 totWeight > let ix = binarySearch cumSumWeights v > xnNew = filterSamplesAti V.! ix > tell$ V.singleton xnNew
>       return $xnNew  To sample a complete path we start with a sample from the filtering distribution at at time $i = N$ (which is also the smoothing distribution) > oneSmoothingPath :: MonadRandom m => > (Int -> V.Vector (Particles a)) -> > (a -> Particles a -> WriterT (Particles a) m a) -> > Int -> m (a, V.Vector a) > oneSmoothingPath filterEstss oneSmoothingStep nTimeSteps = do > let ys = filterEstss nTimeSteps > ix <- sample$ uniform 0 (nParticles - 1)
>   let xn = (V.head ys) V.! ix
>   runWriterT $V.foldM oneSmoothingStep xn (V.tail ys)  > oneSmoothingPath' :: (MonadRandom m, Show a) => > (Int -> V.Vector (Particles a)) -> > (a -> Particles a -> WriterT (Particles a) m a) -> > Int -> WriterT (Particles a) m a > oneSmoothingPath' filterEstss oneSmoothingStep nTimeSteps = do > let ys = filterEstss nTimeSteps > ix <- lift$ sample $uniform 0 (nParticles - 1) > let xn = (V.head ys) V.! ix > V.foldM oneSmoothingStep xn (V.tail ys)  Of course we need to run through the filtering distributions starting at $i = N$ > filterEstss :: Int -> V.Vector (Particles (SystemState Double)) > filterEstss n = V.reverse$ V.fromList $runFilter n  > testSmoothing :: Int -> Int -> [Double] > testSmoothing m n = V.toList$ evalState action (pureMT 23)
>   where
>     action = do
>       xss <- V.replicateM n $> snd <$> (oneSmoothingPath filterEstss (oneSmoothingStep stateUpdate (weight h bigQ')) m)
>       let yss = V.fromList $map V.fromList$
>                 transpose $> V.toList$ V.map (V.toList) $> xss > return$ V.map (/ (fromIntegral n)) $V.map V.sum$ V.map (V.map x1) yss


By eye we can see we get a better fit

and calculating the mean square error for filtering gives $1.87\times10^{-2}$ against the mean square error for smoothing of $9.52\times10^{-3}$; this confirms the fit really is better as one would hope.

# Unknown Gravity

Let us continue with the same example but now assume that $g$ is unknown and that we wish to estimate it. Let us also assume that our apparatus is not subject to noise.

Again we have

$\displaystyle \frac{\mathrm{d}^2\alpha}{\mathrm{d}t^2} = -g\sin\alpha + w(t)$

But now when we discretize it we include a third variable

$\displaystyle \begin{bmatrix} x_{1,i} \\ x_{2,i} \\ x_{3,i} \end{bmatrix} = \begin{bmatrix} x_{1,i-1} + x_{2,i-1}\Delta t \\ x_{2,i-1} - x_{3,i-1}\sin x_{1,i-1}\Delta t \\ x_{3,i-1} \end{bmatrix} + \mathbf{q}_{i-1}$

where $q_i \sim {\mathcal{N}}(0,Q)$

$\displaystyle Q = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & \sigma^2_g \end{bmatrix}$

Again we assume that we can only measure the horizontal position of the pendulum so that

$\displaystyle y_i = \sin x_i + r_k$

where $r_i \sim {\mathcal{N}}(0,R)$.

> type PendulumStateG = R 3

> pendulumSampleG :: MonadRandom m =>
>                   Sym 3 ->
>                   Sym 1 ->
>                   PendulumStateG ->
>                   m (Maybe ((PendulumStateG, PendulumObs), PendulumStateG))
> pendulumSampleG bigQ bigR xPrev = do
>   let x1Prev = fst $headTail xPrev > x2Prev = fst$ headTail $snd$ headTail xPrev
>       x3Prev = fst $headTail$ snd $headTail$ snd $headTail xPrev > eta <- sample$ rvar (MultivariateNormal 0.0 bigQ)
>   let x1= x1Prev + x2Prev * deltaT
>       x2 = x2Prev - g * (sin x1Prev) * deltaT
>       x3 = x3Prev
>       xNew = vector [x1, x2, x3] + eta
>       x1New = fst $headTail xNew > epsilon <- sample$ rvar (MultivariateNormal 0.0 bigR)
>   let yNew = vector [sin x1New] + epsilon
>   return $Just ((xNew, yNew), xNew)  > pendulumSampleGs :: [(PendulumStateG, PendulumObs)] > pendulumSampleGs = evalState (ML.unfoldrM (pendulumSampleG bigQg bigRg) mG) (pureMT 29)  > data SystemStateG a = SystemStateG { gx1 :: a, gx2 :: a, gx3 :: a } > deriving Show  The state update itself > stateUpdateG :: Particles (SystemStateG Double) -> > Particles (SystemStateG Double) > stateUpdateG xPrevs = V.zipWith3 SystemStateG x1s x2s x3s > where > ix = V.length xPrevs > > x1Prevs = V.map gx1 xPrevs > x2Prevs = V.map gx2 xPrevs > x3Prevs = V.map gx3 xPrevs > > deltaTs = V.replicate ix deltaT > x1s = x1Prevs .+ (x2Prevs .* deltaTs) > x2s = x2Prevs .- (x3Prevs .* (V.map sin x1Prevs) .* deltaTs) > x3s = x3Prevs  and its noisy version. > stateUpdateNoisyG :: MonadRandom m => > Sym 3 -> > Particles (SystemStateG Double) -> > m (Particles (SystemStateG Double)) > stateUpdateNoisyG bigQ xPrevs = do > let ix = V.length xPrevs > > let xs = stateUpdateG xPrevs > > x1s = V.map gx1 xs > x2s = V.map gx2 xs > x3s = V.map gx3 xs > > etas <- replicateM ix$ sample $rvar (MultivariateNormal 0.0 bigQ) > let eta1s, eta2s, eta3s :: V.Vector Double > eta1s = V.fromList$ map (fst . headTail) etas
>       eta2s = V.fromList $map (fst . headTail . snd . headTail) etas > eta3s = V.fromList$ map (fst . headTail . snd . headTail . snd . headTail) etas
>
>   return (V.zipWith3 SystemStateG (x1s .+ eta1s) (x2s .+ eta2s) (x3s .+ eta3s))


The function which maps the state to the observable.

> obsUpdateG :: Particles (SystemStateG Double) ->
>              Particles (SystemObs Double)
> obsUpdateG xs = V.map (SystemObs . sin . gx1) xs


The mean and variance of the prior.

> mG :: R 3
> mG = vector [1.6, 0.0, 8.00]

> bigPg :: Sym 3
> bigPg = sym $matrix [ > 1e-6, 0.0, 0.0 > , 0.0, 1e-6, 0.0 > , 0.0, 0.0, 1e-2 > ]  Parameters for the state update; note that the variance is not exactly the same as in the formulation above. > bigQg :: Sym 3 > bigQg = sym$ matrix bigQgl

> qc1G :: Double
> qc1G = 0.0001

> sigmaG :: Double
> sigmaG = 1.0e-2

> bigQgl :: [Double]
> bigQgl = [ qc1G * deltaT^3 / 3, qc1G * deltaT^2 / 2, 0.0,
>            qc1G * deltaT^2 / 2,       qc1G * deltaT, 0.0,
>                            0.0,                 0.0, sigmaG
>          ]


The noise of the measurement.

> bigRg :: Sym 1
> bigRg  = sym $matrix [0.1]  Generate the ensemble of particles from the prior, > initParticlesG :: MonadRandom m => > m (Particles (SystemStateG Double)) > initParticlesG = V.replicateM nParticles$ do
>   r <- sample $rvar (MultivariateNormal mG bigPg) > let x1 = fst$ headTail r
>       x2 = fst $headTail$ snd $headTail r > x3 = fst$ headTail $snd$ headTail $snd$ headTail r
>   return $SystemStateG { gx1 = x1, gx2 = x2, gx3 = x3}  run the particle filter, > runFilterG :: Int -> [Particles (SystemStateG Double)] > runFilterG n = snd$ evalState action (pureMT 19)
>   where
>     action = runWriterT $do > xs <- lift$ initParticlesG
>       V.foldM
>         (oneFilteringStep (stateUpdateNoisyG bigQg) obsUpdateG (weight f bigRg))
>         xs
>         (V.fromList $map (SystemObs . fst . headTail . snd) (take n pendulumSampleGs))  and extract the estimated parameter from the filter. > testFilteringG :: Int -> [Double] > testFilteringG n = map ((/ (fromIntegral nParticles)). sum . V.map gx3) (runFilterG n)  Again we need to run through the filtering distributions starting at $i = N$ > filterGEstss :: Int -> V.Vector (Particles (SystemStateG Double)) > filterGEstss n = V.reverse$ V.fromList $runFilterG n  > testSmoothingG :: Int -> Int -> ([Double], [Double], [Double]) > testSmoothingG m n = (\(x, y, z) -> (V.toList x, V.toList y, V.toList z))$
>                      mkMeans $> chunks > where > > chunks = > V.fromList$ map V.fromList $> transpose$
>       V.toList $V.map (V.toList)$
>       chunksOf m $> snd$ evalState (runWriterT action) (pureMT 23)
>
>     mkMeans yss = (
>       V.map (/ (fromIntegral n)) $V.map V.sum$ V.map (V.map gx1) yss,
>       V.map (/ (fromIntegral n)) $V.map V.sum$ V.map (V.map gx2) yss,
>       V.map (/ (fromIntegral n)) $V.map V.sum$ V.map (V.map gx3) yss
>       )
>
>     action =
>       V.replicateM n \$
>       oneSmoothingPath' filterGEstss
>                         (oneSmoothingStep stateUpdateG (weight hG bigQg))
>                         m


Again by eye we get a better fit but note that we are using the samples in which the state update is noisy as well as the observation so we don’t expect to get a very good fit.

# Notes

## Helpers for Converting Types

> f :: SystemObs Double -> R 1
> f = vector . pure . y1

> h :: SystemState Double -> R 2
> h u = vector [x1 u , x2 u]

> hG :: SystemStateG Double -> R 3
> hG u = vector [gx1 u , gx2 u, gx3 u]


## Helpers for the Inverse CDF

That these are helpers for the inverse CDF is delayed to another blog post.

> indices :: V.Vector Double -> V.Vector Double -> V.Vector Int
> indices bs xs = V.map (binarySearch bs) xs

> binarySearch :: (Ord a) =>
>                 V.Vector a -> a -> Int
> binarySearch vec x = loop 0 (V.length vec - 1)
>   where
>     loop !l !u
>       | u <= l    = l
>       | otherwise = let e = vec V.! k in if x <= e then loop l k else loop (k+1) u
>       where k = l + (u - l) shiftR 1


## Vector Helpers

> chunksOf :: Int -> V.Vector a -> V.Vector (V.Vector a)
> chunksOf n xs = ys
>   where
>     l = V.length xs
>     m  = 1 + (l - 1) div n
>     ys = V.unfoldrN m (\us -> Just (V.take n us, V.drop n us)) xs


# Bibliography

Cappé, Olivier. 2008. “An Introduction to Sequential Monte Carlo for Filtering and Smoothing.” http://www-irma.u-strasbg.fr/~guillou/meeting/cappe.pdf.

Rauch, H. E., C. T. Striebel, and F. Tung. 1965. “Maximum Likelihood Estimates of Linear Dynamic Systems.” Journal of the American Institute of Aeronautics and Astronautics 3 (8): 1445–50.

Särkkä, Simo. 2013. Bayesian Filtering and Smoothing. New York, NY, USA: Cambridge University Press.