Blog post on my new site.
Statistics
Cartography in Haskell
Introduction
This blog started off life as a blog post on how I use nix but somehow transformed itself into a “how I do data visualisation” blog post. The nix is still here though quietly doing its work in the background.
Suppose you want to analyze your local election results and visualize them using a choropleth but you’d like to use Haskell. You could try using the shapefile package but you will have to do a lot of the heavy lifting yourself (see here for an extended example). But you know folks using R can produce such things in a few lines of code and you also know from experience that unless you are very disciplined large R scripts can quickly become unmaintainable. Can you get the best of both worlds? It seems you can if you use inliner. But now you have a bit of configuration problem: how do you make sure that all the Haskell packages that you need and all the R packages that you need are available and how can you make sure that your results are reproducible? My answer to use nix.
Acknowledgements
I’d like to thank all my colleagues at Tweag who have had the patience to indulge my questions especially zimbatm and also all folks on the #nixos irc channel.
Downloading and Building
The sources are on github.
You’ll have to build my Include
pandoc filter somehow and then
pandoc s HowIUseNix.lhs filter=./Include t markdown+lhs > HowIUseNixExpanded.lhs
BlogLiterately HowIUseNixExpanded.lhs > HowIUseNixExpanded.html
Analysing the Data
Before looking at the .nix file I use, let’s actually analyse the data. First some preamble
> {# OPTIONS_GHC Wall #}
> {# OPTIONS_GHC fnowarnunuseddobind #}
> {# LANGUAGE TemplateHaskell #}
> {# LANGUAGE QuasiQuotes #}
> {# LANGUAGE DataKinds #}
> {# LANGUAGE FlexibleContexts #}
> {# LANGUAGE TypeOperators #}
> {# LANGUAGE OverloadedStrings #}
> import Prelude as P
> import qualified Language.R as R
> import Language.R.QQ
> import System.Directory
> import System.FilePath
> import Control.Monad
> import Data.List (stripPrefix, nub)
> import qualified Data.Foldable as Foldable
> import Frames hiding ( (:&) )
> import Control.Lens
> import qualified Data.FuzzySet as Fuzzy
> import Data.Text (pack, unpack)
> import qualified Control.Foldl as Foldl
> import Pipes.Prelude (fold)
We can scrape the data for the Kington on Thames local elections and put it in a CSV file. Using the Haskell equivalent of data frames / pandas, Frames, we can “read” in the data. I say “read” in quotes because if this were a big data set (it’s not) then it might not fit in memory and we’d have to process it as a stream. I’ve included the sheet in the github repo so you can try reproducing my results.
> tableTypes "ElectionResults" "data/Election Results 2018 v1.csv"
> loadElectionResults :: IO (Frame ElectionResults)
> loadElectionResults = inCoreAoS (readTable "data/Election Results 2018 v1.csv")
We could analyse the data set in order to find the ward (aka electoral district) names but I happen to know them.
> wardNames :: [Text]
> wardNames = [ "Berrylands"
> , "Norbiton"
> , "Old Malden"
> , "Coombe Hill"
> , "Beverley"
> , "Chessington North & Hook"
> , "Tolworth and Hook Rise"
> , "Coombe Vale"
> , "St James"
> , "Tudor"
> , "Alexandra"
> , "Canbury"
> , "Surbiton Hill"
> , "Chessington South"
> , "Grove"
> , "St Marks"
> ]
We need a query to get both the total number of votes cast in the ward and the number of votes cast for what is now the party which controls the borough. We can create two filters to run in parallel, summing as we go.
> getPartyWard :: Text > Text > Foldl.Fold ElectionResults (Int, Int)
> getPartyWard p w =
> (,) <$> (Foldl.handles (filtered (\x > x ^. ward == w)) .
> Foldl.handles votes) Foldl.sum
> <*> (Foldl.handles (filtered (\x > x ^. ward == w &&
> x ^. party == p)) .
> Foldl.handles votes) Foldl.sum
We can combine all the filters and sums into one big query where they all run in parallel.
> getVotes :: Foldl.Fold ElectionResults [(Int, Int)]
> getVotes = sequenceA $ map (getPartyWard "Liberal Democrat") wardNames
I downloaded the geographical data from here but I imagine there are many other sources.
First we read all the shapefiles in the directory we downloaded.
> root :: FilePath
> root = "data/KingstonuponThames"
> main :: IO ()
> main = do
> ns < runSafeT $ Foldl.purely fold getVotes (readTable "data/Election Results 2018 v1.csv")
> let ps :: [Double]
> ps = map (\(x, y) > 100.0 * fromIntegral y / fromIntegral x) ns
> let percentLD = zip (map unpack wardNames) ps
>
> ds < getDirectoryContents root
> let es = filter (\x > takeExtension x == ".shp") ds
> let fs = case xs of
> Nothing > error "Ward prefix does not match"
> Just ys > ys
> where
> xs = sequence $
> map (fmap dropExtension) $
> map (stripPrefix "KingstonuponThames_") es
Sadly, when we get the ward names from the file names they don’t match the ward names in the electoral data sheet. Fortunately, fuzzy matching resolves this problem.
> rs < loadElectionResults
> let wardSet :: Fuzzy.FuzzySet
> wardSet = Fuzzy.fromList . nub . Foldable.toList . fmap (^. ward) $ rs
>
> let gs = case sequence $ map (Fuzzy.getOne wardSet) $ map pack fs of
> Nothing > error "Ward name and ward filename do not match"
> Just xs > xs
Rather than making the colour of each ward vary continuously depending on the percentage share of the vote, let’s create 4 buckets to make it easier to see at a glance how the election results played out.
> let bucket :: Double > String
> bucket x  x < 40 = "whitesmoke"
>  x < 50 = "yellow1"
>  x < 60 = "yellow2"
>  otherwise = "yellow3"
>
> let cs :: [String]
> cs = case sequence $ map (\w > lookup w percentLD) (map unpack gs) of
> Nothing > error "Wrong name in table"
> Just xs > map bucket xs
Now we move into the world of R, courtesy of inliner. First we load up the required libraries. Judging by urls like this, it is not always easy to install the R binding to GDAL (Geospatial Data Abstraction Library). But with nix, we don’t even have to use install.packages(...)
.
> R.runRegion $ do
> [r library(rgdal) ]
> [r library(ggplot2) ]
> [r library(dplyr) ]
Next we create an empty map (I have no idea what this would look like if it were plotted).
> map0 < [r ggplot() ]
And now we can fold over the shape data for each ward and its colour to build the basic map we want.
> mapN < foldM
> (\m (f, c) > do
> let wward = root </> f
> shapefileN < [r readOGR(wward_hs) ]
> shapefileN_df < [r fortify(shapefileN_hs) ]
> withColours < [r mutate(shapefileN_df_hs, colour=rep(c_hs, nrow(shapefileN_df_hs))) ]
> [r m_hs +
> geom_polygon(data = withColours_hs,
> aes(x = long, y = lat, group = group, fill=colour),
> color = 'gray', size = .2) ]) map0 (zip es cs)
Now we can annotate the map with all the details needed to make it intelligible.
> map_projected < [r mapN_hs + coord_map() +
> ggtitle("Kingston on Thames 2018 Local Election\nLib Dem Percentage Vote by Ward") +
> theme(legend.position="bottom", plot.title = element_text(lineheight=.8, face="bold", hjust=0.5)) +
> scale_fill_manual(values=c("whitesmoke", "yellow1", "yellow2", "yellow3"),
> name="Vote (%)",
> labels=c("0%40% Lib Dem", "40%50% Lib Dem", "50%60% Lib Dem", "60%100% Lib Dem")) ]
And finally save it in a file.
> [r jpeg(filename="diagrams/kingston.jpg") ]
> [r print(map_projected_hs) ]
> [r dev.off() ]
> return ()
How I Really Use nix
I haven’t found a way of interspersing text with nix code so here are some comments.

I use neither all the Haskell packages listed nor all the R packages. I tend to put a shell.nix file in the directory where I am working and then
nixshell shell.nix I nixpkgs=~/nixpkgs
as at the moment I seem to need packages which were recently added to nixpkgs. 
I couldn’t get the tests for
inliner
to run on MACos so I addeddontCheck
. 
Some Haskell packages in nixpkgs have their bounds set too strictly so I have used
doJailbreak
.
{ pkgs ? import {}, compiler ? "ghc822", doBenchmark ? false }:
let
f = { mkDerivation, haskell, base, foldl, Frames, fuzzyset
, inliner, integration, lens
, pandoctypes, plots
, diagramsrasterific
, diagrams
, diagramssvg
, diagramscontrib
, R, random, stdenv
, templatehaskell, temporary }:
mkDerivation {
pname = "mrp";
version = "1.0.0";
src = ./.;
isLibrary = false;
isExecutable = true;
executableHaskellDepends = [
base
diagrams
diagramsrasterific
diagramssvg
diagramscontrib
(haskell.lib.dontCheck inliner)
foldl
Frames
fuzzyset
integration
lens
pandoctypes
plots
random
templatehaskell
temporary
];
executableSystemDepends = [
R
pkgs.rPackages.dplyr
pkgs.rPackages.ggmap
pkgs.rPackages.ggplot2
pkgs.rPackages.knitr
pkgs.rPackages.maptools
pkgs.rPackages.reshape2
pkgs.rPackages.rgeos
pkgs.rPackages.rgdal
pkgs.rPackages.rstan
pkgs.rPackages.zoo];
license = stdenv.lib.licenses.bsd3;
};
haskellPackages = if compiler == "default"
then pkgs.haskellPackages
else pkgs.haskell.packages.${compiler};
myHaskellPackages = haskellPackages.override {
overrides = self: super: with pkgs.haskell.lib; {
diagramsrasterific = doJailbreak super.diagramsrasterific;
diagramssvg = doJailbreak super.diagramssvg;
diagramscontrib = doJailbreak super.diagramscontrib;
diagrams = doJailbreak super.diagrams;
inliner = dontCheck super.inliner;
pipesgroup = doJailbreak super.pipesgroup;
};
};
variant = if doBenchmark then pkgs.haskell.lib.doBenchmark else pkgs.lib.id;
drv = variant (myHaskellPackages.callPackage f {});
in
if pkgs.lib.inNixShell then drv.env else drv
Reproducibility and Old Faithful
Introduction
For the blog post still being written on variatonal methods, I referred to the still excellent Bishop (2006) who uses as his example data, the data available in R for the geyser in Yellowstone National Park called “Old Faithful”.
While explaining this to another statistician, they started to ask about the dataset. Since I couldn’t immediately answer their questions (when was it collected? over how long? how was it collected?), I started to look at the dataset more closely. The more I dug into where the data came from, the less certain I became that the dataset actually reflected how Old Faithful actually behaves.
Here’s what I found. On the way, I use Haskell, R embedded in Haskell, data frames in Haskell and nix.
The Investigation
First some imports and language extensions.
> {# OPTIONS_GHC Wall #}
> {# OPTIONS_GHC fnowarnunusedtopbinds #}
>
> {# LANGUAGE QuasiQuotes #}
> {# LANGUAGE ScopedTypeVariables #}
> {# LANGUAGE DataKinds #}
> {# LANGUAGE FlexibleContexts #}
> {# LANGUAGE TemplateHaskell #}
> {# LANGUAGE TypeOperators #}
> {# LANGUAGE TypeSynonymInstances #}
> {# LANGUAGE FlexibleInstances #}
> {# LANGUAGE QuasiQuotes #}
> {# LANGUAGE UndecidableInstances #}
> {# LANGUAGE ExplicitForAll #}
> {# LANGUAGE ScopedTypeVariables #}
> {# LANGUAGE OverloadedStrings #}
> {# LANGUAGE GADTs #}
> {# LANGUAGE TypeApplications #}
>
>
> module Main (main) where
>
> import Prelude as P
>
> import Control.Lens
>
> import Data.Foldable
> import Frames hiding ( (:&) )
>
> import Data.Vinyl (Rec(..))
>
> import qualified Language.R as R
> import Language.R (R)
> import Language.R.QQ
> import Data.Int
>
> import Plots as P
> import qualified Diagrams.Prelude as D
> import Diagrams.Backend.Rasterific
>
> import Data.Time.Clock(UTCTime, NominalDiffTime, diffUTCTime)
> import Data.Time.Clock.POSIX(posixSecondsToUTCTime)
>
> import Data.Attoparsec.Text
> import Control.Applicative
R provides many datasets presumably for ease of use. We can access the “Old Faithful” data set in Haskell using some very simple inline R.
> eRs :: R s [Double]
> eRs = R.dynSEXP <$> [r faithful$eruptions ]
>
> wRs :: R s [Int32]
> wRs = R.dynSEXP <$> [r faithful$waiting ]
And then plot the dataset using
> kSaxis :: [(Double, Double)] > P.Axis B D.V2 Double
> kSaxis xs = P.r2Axis &~ do
> P.scatterPlot' xs
>
> plotOF :: IO ()
> plotOF = do
> es' < R.runRegion $ do eRs
> ws' < R.runRegion $ do wRs
> renderRasterific "diagrams/Test.png"
> (D.dims2D 500.0 500.0)
> (renderAxis $ kSaxis $ P.zip es' (P.map fromIntegral ws'))
In the documentation for this dataset, the source is given as Härdle (2012). In here the data are given as a table in Table 3 on page 201. It contains 270 observations not 272 as the R dataset does! Note that there seems to be a correlation between intereruption time and duration of eruption and eruptions seem to fall into one of two groups.
The documentation also says “See Also geyser in package MASS for the Azzalini–Bowman version”. So let’s look at this and plot it.
> eAltRs :: R s [Double]
> eAltRs = R.dynSEXP <$> [r geyser$duration ]
>
> wAltRs :: R s [Int32]
> wAltRs = R.dynSEXP <$> [r geyser$waiting ]
>
> plotOF' :: IO ()
> plotOF' = do
> es < R.runRegion $ do _ < [r library(MASS) ]
> eAltRs
> ws < R.runRegion $ do wAltRs
> renderRasterific "diagrams/TestR.png"
> (D.dims2D 500.0 500.0)
> (renderAxis $ kSaxis $ P.zip es (P.map fromIntegral ws))
A quick look at the first few entries suggests this is the data in Table 1 of Azzalini and Bowman (1990) but NB I didn’t check each item.
It looks quite different to the faithful dataset. But to answer my fellow statistician’s questions: “This analysis deals with data which were collected continuously from August 1st until August 15th, 1985”.
It turns out there is a website which collects data for a large number of geysers and has historical datasets for them including old faithful. The data is in a slightly awkward format and it is not clear (to me at any rate) what some of the fields mean.
Let’s examine the first 1000 observations in the data set using the Haskell package Frames.
First we ask Frames to automatically type the data and read it into a frame.
> tableTypes "OldFaithFul" "/Users/dom/Dropbox/Tidy/NumMethHaskell/variational/Old_Faithful_eruptions_1000.csv"
>
> loadOldFaithful :: IO (Frame OldFaithFul)
> loadOldFaithful = inCoreAoS (readTable "/Users/dom/Dropbox/Tidy/NumMethHaskell/variational/Old_Faithful_eruptions_1000.csv")
Declare some columns and helper functions to manipulate the times.
> declareColumn "eruptionTimeNew" ''UTCTime
> declareColumn "eruptionTimeOld" ''UTCTime
> declareColumn "eruptionTimeDiff" ''NominalDiffTime
> declareColumn "eruptionTimeRat" ''Rational
> declareColumn "eruptionTimeDoub" ''Double
> declareColumn "parsedDuration" ''Double
>
> epochToUTC :: Integral a => a > UTCTime
> epochToUTC = posixSecondsToUTCTime . fromIntegral
>
> getEruptionTimeNew :: OldFaithFul > Record '[EruptionTimeNew]
> getEruptionTimeNew x = pure (Col ((epochToUTC (x ^. eruptionTimeEpoch)))) :& Nil
>
> getEruptionTimeOld :: OldFaithFul > Record '[EruptionTimeOld]
> getEruptionTimeOld x = pure (Col ((epochToUTC (x ^. eruptionTimeEpoch)))) :& Nil
>
> getEruptionTimeDiff :: Record '[EruptionTimeOld, EruptionTimeNew] > Record '[EruptionTimeDoub]
> getEruptionTimeDiff x = pure (Col ((/60.0) $ fromRational $ toRational $ diffUTCTime (x ^. eruptionTimeNew) (x ^. eruptionTimeOld))) :& Nil
The duration is expressed as a string mainly as e.g. “3.5m” and “3.5min” but occasionally in other formats. Using NaNs to represent data we can’t parse is poor practice but Frames seems to object to types such as Maybe Double.
> getDuration :: OldFaithFul > Record '["duration" :> Text]
> getDuration = (rcast @'[Duration])
>
> getParsedDuration :: Record '[Duration, EruptionTimeDoub] > Record '[ParsedDuration]
> getParsedDuration x = pure (Col ((f $ (parseOnly parseDuration) (x ^. duration)))) :& Nil
> where
> f (Left _) = 0.0 / 0.0
> f (Right y) = y
>
> parseDuration :: Parser Double
> parseDuration =
> do d < double
> _ < char 'm'
> return d
> <>
> do d < double
> _ < string "min"
> return d
To get the times between eruptions we need to zip the frame with its tail so we define this helper function.
> dropFrame :: Int > Frame r > Frame r
> dropFrame n s = Frame ((frameLength s)  1) (\i > frameRow s (i + n))
> main :: IO ()
> main = do
> x < loadOldFaithful
Get the current eruption times and the previous eruption times and put them in a frame.
> let tn = dropFrame 1 $ fmap (\b > (getEruptionTimeNew b)) x
> let tp = fmap (\b > (getEruptionTimeOld b)) x
> let tpns = zipFrames tp tn
Get the durations of the eruptions and exclude any gaps when no observations were taken; arbitrarily if the gap is greate than 1.5 hours.
> let ds = fmap getDuration x
> let tds = zipFrames ds (fmap getEruptionTimeDiff tpns)
> let b = filterFrame (\u > (u ^. eruptionTimeDoub) <= 150) tds
Parse some of the durations and put the durations and the intereruption gap into a single frame.
> let c = filterFrame (\u > not $ isNaN $ u ^. parsedDuration) $
> fmap getParsedDuration b
> let d = zipFrames b c
> let g = fmap (\u > (u ^. parsedDuration, u ^. eruptionTimeDoub)) d
Finally we can yet another data set of observations of old faithful.
> renderRasterific "diagrams/TestH1000.png"
> (D.dims2D 500.0 500.0)
> (renderAxis $ kSaxis $ toList g)
> R.withEmbeddedR R.defaultConfig $ do
> plotOF
> plotOF'
> putStrLn "Finished"
To me this doesn’t look like either of the other two datasets. Perhaps the R faithful data set should be validated, possibly retired and maybe replaced by something with firmer foundations?
Nix
I use nix on MACos for package management both for Haskell and R – no more install.packages problems. Here’s my shell.nix file also available here: here.
{ nixpkgs ? import {}, compiler ? "ghc822", doBenchmark ? false }:
let
inherit (nixpkgs) pkgs;
f = { mkDerivation, array, base, bytestring, cassava, containers
, datasets, diagramslib, diagramsrasterific, foldl, Frames, ghcprim, hmatrix, hmatrixgsl
, inliner, lens, mtl, pipes, plots, randomfu, R, randomsource
, stdenv, templatehaskell, text, typelitswitnesses, vector, vinyl }:
mkDerivation {
pname = "variational";
version = "0.1.0.0";
src = ./.;
isLibrary = false;
isExecutable = true;
executableHaskellDepends = [
array
base
bytestring
cassava
containers
datasets
diagramslib
diagramsrasterific
foldl
Frames
ghcprim
hmatrix
inliner
lens
mtl
pipes
plots
randomfu
randomsource
templatehaskell
text
typelitswitnesses
vector
vinyl
];
executableSystemDepends = [
R
pkgs.rPackages.anytime
pkgs.rPackages.ggplot2
pkgs.rPackages.maptools
pkgs.rPackages.reshape2
pkgs.rPackages.rgeos
pkgs.rPackages.rgdal
pkgs.rPackages.rstan ];
license = stdenv.lib.licenses.bsd3;
};
haskellPackages = if compiler == "default"
then pkgs.haskellPackages
else pkgs.haskell.packages.${compiler};
variant = if doBenchmark then pkgs.haskell.lib.doBenchmark else pkgs.lib.id;
drv = variant (haskellPackages.callPackage f {});
in
if pkgs.lib.inNixShell then drv.env else drv
References
Azzalini, A, and Adrian Bowman. 1990. “Bowman, A.w.: a Look at Some Data on the Old Faithful Geyser. Appl. Stat. 39, 357365” 39 (January).
Bishop, C.M. 2006. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer. https://books.google.co.uk/books?id=kTNoQgAACAAJ.
Härdle, W. 2012. Smoothing Techniques: With Implementation in S. Springer Series in Statistics. Springer New York. https://books.google.co.uk/books?id=sijUBwAAQBAJ.
Modelling an Ecosystem via Hamiltonian Monte Carlo
Introduction
Recall from the previous post that the Hare growth parameter undergoes Brownian motion so that the further into the future we go, the less certain we are about it. In order to ensure that this parameter remains positive, let’s model the log of it to be Brownian motion.
where the final equation is a stochastic differential equation with being a Wiener process.
By Itô we have
Again, we see that the populations become noisier the further into the future we go.
Inference
Now let us infer the growth rate using Hamiltonian Monte Carlo and the domain specific probabilistic language Stan (Stan Development Team (2015b), Stan Development Team (2015a), Hoffman and Gelman (2014), Carpenter (2015)). Here’s the model expressed in Stan.
functions {
real f1 (real a, real k1, real b, real p, real z) {
real q;
q = a * p * (1  p / k1)  b * p * z;
return q;
}
real f2 (real d, real k2, real c, real p, real z) {
real q;
q = d * z * (1 + z / k2) + c * p * z;
return q;
}
}
data {
int<lower=1> T; // Number of observations
real y[T]; // Observed hares
real k1; // Hare carrying capacity
real b; // Hare death rate per lynx
real d; // Lynx death rate
real k2; // Lynx carrying capacity
real c; // Lynx birth rate per hare
real deltaT; // Time step
}
parameters {
real<lower=0> mu;
real<lower=0> sigma;
real<lower=0> p0;
real<lower=0> z0;
real<lower=0> rho0;
real w[T];
}
transformed parameters {
real<lower=0> p[T];
real<lower=0> z[T];
real<lower=0> rho[T];
p[1] = p0;
z[1] = z0;
rho[1] = rho0;
for (t in 1:(T1)){
p[t+1] = p[t] + deltaT * f1 (exp(rho[t]), k1, b, p[t], z[t]);
z[t+1] = z[t] + deltaT * f2 (d, k2, c, p[t], z[t]);
rho[t+1] = rho[t] * exp(sigma * sqrt(deltaT) * w[t]  0.5 * sigma * sigma * deltaT);
}
}
model {
mu ~ uniform(0.0,1.0);
sigma ~ uniform(0.0, 0.5);
p0 ~ lognormal(log(100.0), 0.2);
z0 ~ lognormal(log(50.0), 0.1);
rho0 ~ normal(log(mu), sigma);
w ~ normal(0.0,1.0);
for (t in 1:T) {
y[t] ~ lognormal(log(p[t]),0.1);
}
}
Let’s look at the posteriors of the hyperparameters for the Hare growth parameter.
Again, the estimate for is pretty decent. For our generated data, and given our observations are quite noisy maybe the estimate for this is not too bad also.
Appendix: The R Driving Code
All code including the R below can be downloaded from github.
install.packages("devtools")
library(devtools)
install_github("libbi/RBi",ref="master")
install_github("sbfnk/RBi.helpers",ref="master")
rm(list = ls(all.names=TRUE))
unlink(".RData")
library('RBi')
try(detach(package:RBi, unload = TRUE), silent = TRUE)
library(RBi, quietly = TRUE)
library('RBi.helpers')
library('ggplot2', quietly = TRUE)
library('gridExtra', quietly = TRUE)
endTime < 50
PP < bi_model("PP.bi")
synthetic_dataset_PP < bi_generate_dataset(endtime=endTime,
model=PP,
seed="42",
verbose=TRUE,
add_options = list(
noutputs=500))
rdata_PP < bi_read(synthetic_dataset_PP)
df < data.frame(rdata_PP$P$nr,
rdata_PP$P$value,
rdata_PP$Z$value,
rdata_PP$P_obs$value)
ggplot(df, aes(rdata_PP$P$nr, y = Population, color = variable), size = 0.1) +
geom_line(aes(y = rdata_PP$P$value, col = "Hare"), size = 0.1) +
geom_line(aes(y = rdata_PP$Z$value, col = "Lynx"), size = 0.1) +
geom_point(aes(y = rdata_PP$P_obs$value, col = "Observations"), size = 0.1) +
theme(legend.position="none") +
ggtitle("Example Data") +
xlab("Days") +
theme(axis.text=element_text(size=4),
axis.title=element_text(size=6,face="bold")) +
theme(plot.title = element_text(size=10))
ggsave(filename="diagrams/LVdata.png",width=4,height=3)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
lvStanModel < stan_model(file = "SHO.stan",verbose=TRUE)
lvFit < sampling(lvStanModel,
seed=42,
data=list(T = length(rdata_PP$P_obs$value),
y = rdata_PP$P_obs$value,
k1 = 2.0e2,
b = 2.0e2,
d = 4.0e1,
k2 = 2.0e1,
c = 4.0e3,
deltaT = rdata_PP$P_obs$time[2]  rdata_PP$P_obs$time[1]
),
chains=1)
samples < extract(lvFit)
gs1 < qplot(x = samples$mu, y = ..density.., geom = "histogram") + xlab(expression(\mu))
gs2 < qplot(x = samples$sigma, y = ..density.., geom = "histogram") + xlab(expression(samples$sigma))
gs3 < grid.arrange(gs1, gs2)
ggsave(plot=gs3,filename="diagrams/LvPosteriorStan.png",width=4,height=3)
synthetic_dataset_PP1 < bi_generate_dataset(endtime=endTime,
model=PP,
init = list(P = 100, Z=50),
seed="42",
verbose=TRUE,
add_options = list(
noutputs=500))
rdata_PP1 < bi_read(synthetic_dataset_PP1)
synthetic_dataset_PP2 < bi_generate_dataset(endtime=endTime,
model=PP,
init = list(P = 150, Z=25),
seed="42",
verbose=TRUE,
add_options = list(
noutputs=500))
rdata_PP2 < bi_read(synthetic_dataset_PP2)
df1 < data.frame(hare = rdata_PP$P$value,
lynx = rdata_PP$Z$value,
hare1 = rdata_PP1$P$value,
lynx1 = rdata_PP1$Z$value,
hare2 = rdata_PP2$P$value,
lynx2 = rdata_PP2$Z$value)
ggplot(df1) +
geom_path(aes(x=df1$hare, y=df1$lynx, col = "0"), size = 0.1) +
geom_path(aes(x=df1$hare1, y=df1$lynx1, col = "1"), size = 0.1) +
geom_path(aes(x=df1$hare2, y=df1$lynx2, col = "2"), size = 0.1) +
theme(legend.position="none") +
ggtitle("Phase Space") +
xlab("Hare") +
ylab("Lynx") +
theme(axis.text=element_text(size=4),
axis.title=element_text(size=6,face="bold")) +
theme(plot.title = element_text(size=10))
ggsave(filename="diagrams/PPviaLibBi.png",width=4,height=3)
PPInfer < bi_model("PPInfer.bi")
bi_object_PP < libbi(client="sample", model=PPInfer, obs = synthetic_dataset_PP)
bi_object_PP$run(add_options = list(
"endtime" = endTime,
noutputs = endTime,
nsamples = 2000,
nparticles = 128,
seed=42,
nthreads = 1),
verbose = TRUE,
stdoutput_file_name = tempfile(pattern="pmmhoutput", fileext=".txt"))
bi_file_summary(bi_object_PP$result$output_file_name)
mu < bi_read(bi_object_PP, "mu")$value
g1 < qplot(x = mu[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(mu))
sigma < bi_read(bi_object_PP, "sigma")$value
g2 < qplot(x = sigma[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(sigma))
g3 < grid.arrange(g1, g2)
ggsave(plot=g3,filename="diagrams/LvPosterior.png",width=4,height=3)
df2 < data.frame(hareActs = rdata_PP$P$value,
hareObs = rdata_PP$P_obs$value)
ggplot(df, aes(rdata_PP$P$nr, y = value, color = variable)) +
geom_line(aes(y = rdata_PP$P$value, col = "Phyto")) +
geom_line(aes(y = rdata_PP$Z$value, col = "Zoo")) +
geom_point(aes(y = rdata_PP$P_obs$value, col = "Phyto Obs"))
ln_alpha < bi_read(bi_object_PP, "ln_alpha")$value
P < matrix(bi_read(bi_object_PP, "P")$value,nrow=51,byrow=TRUE)
Z < matrix(bi_read(bi_object_PP, "Z")$value,nrow=51,byrow=TRUE)
data50 < bi_generate_dataset(endtime=endTime,
model=PP,
seed="42",
verbose=TRUE,
add_options = list(
noutputs=50))
rdata50 < bi_read(data50)
df3 < data.frame(days = c(1:51), hares = rowMeans(P), lynxes = rowMeans(Z),
actHs = rdata50$P$value, actLs = rdata50$Z$value)
ggplot(df3) +
geom_line(aes(x = days, y = hares, col = "Est Phyto")) +
geom_line(aes(x = days, y = lynxes, col = "Est Zoo")) +
geom_line(aes(x = days, y = actHs, col = "Act Phyto")) +
geom_line(aes(x = days, y = actLs, col = "Act Zoo"))
Bibliography
Carpenter, Bob. 2015. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software.
Hoffman, Matthew D., and Andrew Gelman. 2014. “The NoUTurn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research.
Stan Development Team. 2015a. Stan Modeling Language User’s Guide and Reference Manual, Version 2.10.0. http://mcstan.org/.
———. 2015b. “Stan: A C++ Library for Probability and Sampling, Version 2.10.0.” http://mcstan.org/.
Fun with LibBi and Influenza
Introduction
This is a bit different from my usual posts (well apart from my write up of hacking at Odessa) in that it is a log of how I managed to get LibBi (Library for Bayesian Inference) to run on my MacBook and then not totally satisfactorily (as you will see if you read on).
The intention is to try a few more approaches to the same problem, for example, Stan, monadbayes and handcrafted.
Kermack and McKendrick (1927) give a simple model of the spread of an infectious disease. Individuals move from being susceptible () to infected () to recovered ().
In 1978, anonymous authors sent a note to the British Medical Journal reporting an influenza outbreak in a boarding school in the north of England (“Influenza in a boarding school” 1978). The chart below shows the solution of the SIR (Susceptible, Infected, Record) model with parameters which give roughly the results observed in the school.
LibBi
Step 1
~/LibBistable/SIRmaster $ ./init.sh
error: 'ncread' undefined near line 6 column 7
The README says this is optional so we can skip over it. Still it would be nice to fit the bridge weight function as described in Moral and Murray (2015).
The README does say that GPML is required but since we don’t (yet) need to do this step, let’s move on.
~/LibBistable/SIRmaster $ ./run.sh
./run.sh
Error: ./configure failed with return code 77. See
.SIR/build_openmp_cuda_single/configure.log and
.SIR/build_openmp_cuda_single/config.log for details
It seems the example is configured to run on CUDA and it is highly likely that my installation of LibBI was not set up to allow this. We can change config.conf
from
disableassert
enablesingle
enablecuda
nthreads 2
to
nthreads 4
enablesse
disableassert
On to the next issue.
~/LibBistable/SIRmaster $ ./run.sh
./run.sh
Error: ./configure failed with return code 1. required QRUpdate
library not found. See .SIR/build_sse/configure.log and
.SIR/build_sse/config.log for details
But QRUpdate is installed!
~/LibBistable/SIRmaster $ brew info QRUpdate
brew info QRUpdate
homebrew/science/qrupdate: stable 1.1.2 (bottled)
http://sourceforge.net/projects/qrupdate/
/usr/local/Cellar/qrupdate/1.1.2 (3 files, 302.6K)
/usr/local/Cellar/qrupdate/1.1.2_2 (6 files, 336.3K)
Poured from bottle
/usr/local/Cellar/qrupdate/1.1.2_3 (6 files, 337.3K) *
Poured from bottle
From: https://github.com/Homebrew/homebrewscience/blob/master/qrupdate.rb
==> Dependencies
Required: veclibfort ✔
Optional: openblas ✔
==> Options
withopenblas
Build with openblas support
withoutcheck
Skip buildtime tests (not recommended)
Let’s look in the log as advised. So it seems that a certain symbol cannot be found.
checking for dch1dn_ in lqrupdate
Let’s try ourselves.
nm g /usr/local/Cellar/qrupdate/1.1.2_3/lib/libqrupdate.a  grep dch1dn_
0000000000000000 T _dch1dn_
So the symbol is there! What gives? Let’s try setting one of the environment variables.
export LDFLAGS='L/usr/local/lib'
Now we get further.
./run.sh
Error: ./configure failed with return code 1. required NetCDF header
not found. See .SIR/build_sse/configure.log and
.SIR/build_sse/config.log for details
So we just need to set another environment variable.
export CPPFLAGS='I/usr/local/include/'
This is more mysterious.
./run.sh
Error: ./configure failed with return code 1. required Boost header
not found. See .SIR/build_sse/configure.log and
.SIR/build_sse/config.log for details ~/LibBistable/SIRmaster
Let’s see what we have.
brew list  grep i boost
Nothing! I recall having some problems with boost
when trying to use a completely different package. So let’s install boost
.
brew install boost
Now we get a different error.
./run.sh
Error: make failed with return code 2, see .SIR/build_sse/make.log for details
Fortunately at some time in the past sbfnk took pity on me and advised me here to use boost155
, a step that should not be lightly undertaken.
/usr/local/Cellar/boost155/1.55.0_1: 10,036 files, 451.6M, built in 15 minutes 9 seconds
Even then I had to say
brew link force boost155
Finally it runs.
./run.sh 2> out.txt
And produces a lot of output
wc l out.txt
49999 out.txt
ls ltrh results/posterior.nc
1.7G Apr 30 19:57 results/posterior.nc
Rather worringly, out.txt
has all lines of the form
1: 51.9191 23.2045 nan beats inf inf inf accept=0.5
nan
beating inf
does not sound good.
Now we are in a position to analyse the results.
octave path oct/ eval "plot_and_print"
error: 'bi_plot_quantiles' undefined near line 23 column 5
I previously found an Octave package(?) called OctBi
so let’s create an .octaverc
file which adds this to the path. We’ll also need to load the netcdf
package which we previously installed.
addpath ("../OctBistable/inst")
pkg load netcdf
~/LibBistable/SIRmaster $ octave path oct/ eval "plot_and_print"
octave path oct/ eval "plot_and_print"
warning: division by zero
warning: called from
mean at line 117 column 7
read_hist_simulator at line 47 column 11
bi_read_hist at line 85 column 12
bi_hist at line 63 column 12
plot_and_print at line 56 column 5
warning: division by zero
warning: division by zero
warning: division by zero
warning: division by zero
warning: division by zero
warning: print.m: fig2dev binary is not available.
Some output formats are not available.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering.
sh: pdfcrop: command not found
I actually get a chart from this so some kind of success.
This does not look like the chart in the Moral and Murray (2015), the fitted number of infected patients looks a lot smoother and the “rates” parameters also vary in a much smoother manner. For reasons I haven’t yet investigated, it looks like overfitting. Here’s the charts in the paper.
Bibliography
“Influenza in a boarding school.” 1978. British Medical Journal, March, 587.
Kermack, W. O., and A. G. McKendrick. 1927. “A Contribution to the Mathematical Theory of Epidemics.” Proceedings of the Royal Society of London Series A 115 (August): 700–721. doi:10.1098/rspa.1927.0118.
Moral, Pierre Del, and Lawrence M Murray. 2015. “Sequential Monte Carlo with Highly Informative Observations.”
Every Manifold is Paracompact
Introduction
In their paper Betancourt et al. (2014), the authors give a corollary which starts with the phrase “Because the manifold is paracompact”. It wasn’t immediately clear why the manifold was paracompact or indeed what paracompactness meant although it was clearly something like compactness which means that every cover has a finite subcover.
It turns out that every manifold is paracompact and that this is intimately related to partitions of unity.
Most of what I have written below is taken from some handwritten anonymous lecture notes I found by chance in the DPMMS library in Cambridge University. To whomever wrote them: thank you very much.
Limbering Up
Let be an open cover of a smooth manifold . A partition of unity on M, subordinate to the cover is a finite collection of smooth functions
where for some such that
and for each there exists such that
We don’t yet know partitions of unity exist.
First define
Techniques of classical analysis easily show that is smooth ( is the only point that might be in doubt and it can be checked from first principles that for all ).
Next define
Finally we can define by . This has the properties
 if
 if
Now take a point centred in a chart so that, without loss of generality, (we can always choose so that the open ball and then define another chart with ).
Define the images of the open and closed balls of radius and respectively
and further define bump functions
Then is smooth and its support lies in .
By compactness, the open cover has a finite subcover . Now define
by
Then is smooth, and . Thus is the required partition of unity.
Paracompactness
Because is a manifold, it has a countable basis and for any point , there must exist with . Choose one of these and call it . This gives a countable cover of by such sets.
Now define
where, since is compact, is a finite subcover.
And further define
where again, since is compact, is a finite subcover.
Now define
Then is compact, is open and . Furthermore, and only intersects with and .
Given any open cover of , each can be covered by a finite number of open sets in contained in some member of . Thus every point in can be covered by at most a finite number of sets from and and which are contained in some member of . This is a locally finite refinement of and which is precisely the definition of paracompactness.
To produce a partition of unity we define bump functions as above on this locally finite cover and note that locally finite implies that is well defined. Again, as above, define
to get the required result.
Bibliography
Betancourt, M. J., Simon Byrne, Samuel Livingstone, and Mark Girolami. 2014. “The Geometric Foundations of Hamiltonian Monte Carlo,” October, 45. http://arxiv.org/abs/1410.5110.
Particle Smoothing
Introduction
The equation of motion for a pendulum of unit length subject to Gaussian white noise is
We can discretize this via the usual Euler method
where and
The explanation of the precise form of the covariance matrix will be the subject of another blog post; for the purpose of exposition of forward filtering / backward smoothing, this detail is not important.
Assume that we can only measure the horizontal position of the pendulum and further that this measurement is subject to error so that
where .
Particle Filtering can give us an estimate of where the pendulum is and its velocity using all the observations up to that point in time. But now suppose we have observed the pendulum for a fixed period of time then at times earlier than the time at which we stop our observations we now have observations in the future as well as in the past. If we can somehow take account of these future observations then we should be able to improve our estimate of where the pendulum was at any given point in time (as well as its velocity). Forward Filtering / Backward Smoothing is a technique for doing this.
Haskell Preamble
> {# OPTIONS_GHC Wall #}
> {# OPTIONS_GHC fnowarnnameshadowing #}
> {# OPTIONS_GHC fnowarntypedefaults #}
> {# OPTIONS_GHC fnowarnunuseddobind #}
> {# OPTIONS_GHC fnowarnmissingmethods #}
> {# OPTIONS_GHC fnowarnorphans #}
> {# LANGUAGE MultiParamTypeClasses #}
> {# LANGUAGE TypeFamilies #}
> {# LANGUAGE ScopedTypeVariables #}
> {# LANGUAGE ExplicitForAll #}
> {# LANGUAGE DataKinds #}
> {# LANGUAGE FlexibleInstances #}
> {# LANGUAGE MultiParamTypeClasses #}
> {# LANGUAGE FlexibleContexts #}
> {# LANGUAGE TypeFamilies #}
> {# LANGUAGE BangPatterns #}
> {# LANGUAGE GeneralizedNewtypeDeriving #}
> {# LANGUAGE TemplateHaskell #}
> {# LANGUAGE DataKinds #}
> {# LANGUAGE DeriveGeneric #}
> module PendulumSamples ( pendulumSamples
> , pendulumSamples'
> , testFiltering
> , testSmoothing
> , testFilteringG
> , testSmoothingG
> ) where
> import Data.Random hiding ( StdNormal, Normal )
> import Data.Random.Source.PureMT ( pureMT )
> import Control.Monad.State ( evalState, replicateM )
> import qualified Control.Monad.Loops as ML
> import Control.Monad.Writer ( tell, WriterT, lift,
> runWriterT
> )
> import Numeric.LinearAlgebra.Static
> ( R, vector, Sym,
> headTail, matrix, sym,
> diag
> )
> import GHC.TypeLits ( KnownNat )
> import MultivariateNormal ( MultivariateNormal(..) )
> import qualified Data.Vector as V
> import Data.Bits ( shiftR )
> import Data.List ( transpose )
> import Control.Parallel.Strategies
> import GHC.Generics (Generic)
Simulation
Let’s first plot some typical trajectories of the pendulum.
> deltaT, g :: Double
> deltaT = 0.01
> g = 9.81
> type PendulumState = R 2
> type PendulumObs = R 1
> pendulumSample :: MonadRandom m =>
> Sym 2 >
> Sym 1 >
> PendulumState >
> m (Maybe ((PendulumState, PendulumObs), PendulumState))
> pendulumSample bigQ bigR xPrev = do
> let x1Prev = fst $ headTail xPrev
> x2Prev = fst $ headTail $ snd $ headTail xPrev
> eta < sample $ rvar (MultivariateNormal 0.0 bigQ)
> let x1= x1Prev + x2Prev * deltaT
> x2 = x2Prev  g * (sin x1Prev) * deltaT
> xNew = vector [x1, x2] + eta
> x1New = fst $ headTail xNew
> epsilon < sample $ rvar (MultivariateNormal 0.0 bigR)
> let yNew = vector [sin x1New] + epsilon
> return $ Just ((xNew, yNew), xNew)
Let’s try plotting some samples when we are in the linear region with which we are familiar from school .
In this case we expect the horizontal displacement to be approximately equal to the angle of displacement and thus the observations to be symmetric about the actuals.
> bigQ :: Sym 2
> bigQ = sym $ matrix bigQl
> qc1 :: Double
> qc1 = 0.0001
> bigQl :: [Double]
> bigQl = [ qc1 * deltaT^3 / 3, qc1 * deltaT^2 / 2,
> qc1 * deltaT^2 / 2, qc1 * deltaT
> ]
> bigR :: Sym 1
> bigR = sym $ matrix [0.0001]
> m0 :: PendulumState
> m0 = vector [0.01, 0]
> pendulumSamples :: [(PendulumState, PendulumObs)]
> pendulumSamples = evalState (ML.unfoldrM (pendulumSample bigQ bigR) m0) (pureMT 17)
But if we work in a region in which linearity breaks down then the observations are no longer symmetrical about the actuals.
> bigQ' :: Sym 2
> bigQ' = sym $ matrix bigQl'
> qc1' :: Double
> qc1' = 0.01
> bigQl' :: [Double]
> bigQl' = [ qc1' * deltaT^3 / 3, qc1' * deltaT^2 / 2,
> qc1' * deltaT^2 / 2, qc1' * deltaT
> ]
> bigR' :: Sym 1
> bigR' = sym $ matrix [0.1]
> m0' :: PendulumState
> m0' = vector [1.6, 0]
> pendulumSamples' :: [(PendulumState, PendulumObs)]
> pendulumSamples' = evalState (ML.unfoldrM (pendulumSample bigQ' bigR') m0') (pureMT 17)
Filtering
We do not give the theory behind particle filtering. The interested reader can either consult Särkkä (2013) or wait for a future blog post on the subject.
> nParticles :: Int
> nParticles = 500
The usual Bayesian update step.
> type Particles a = V.Vector a
> oneFilteringStep ::
> MonadRandom m =>
> (Particles a > m (Particles a)) >
> (Particles a > Particles b) >
> (b > b > Double) >
> Particles a >
> b >
> WriterT [Particles a] m (Particles a)
> oneFilteringStep stateUpdate obsUpdate weight statePrevs obs = do
> tell [statePrevs]
> stateNews < lift $ stateUpdate statePrevs
> let obsNews = obsUpdate stateNews
> let weights = V.map (weight obs) obsNews
> cumSumWeights = V.tail $ V.scanl (+) 0 weights
> totWeight = V.last cumSumWeights
> vs < lift $ V.replicateM nParticles (sample $ uniform 0.0 totWeight)
> let js = indices cumSumWeights vs
> stateTildes = V.map (stateNews V.!) js
> return stateTildes
The system state and observable.
> data SystemState a = SystemState { x1 :: a, x2 :: a }
> deriving (Show, Generic)
> instance NFData a => NFData (SystemState a)
> newtype SystemObs a = SystemObs { y1 :: a }
> deriving Show
To make the system state update a bit more readable, let’s introduce some lifted arithmetic operators.
> (.+), (.*), (.) :: (Num a) => V.Vector a > V.Vector a > V.Vector a
> (.+) = V.zipWith (+)
> (.*) = V.zipWith (*)
> (.) = V.zipWith ()
The state update itself
> stateUpdate :: Particles (SystemState Double) >
> Particles (SystemState Double)
> stateUpdate xPrevs = V.zipWith SystemState x1s x2s
> where
> ix = V.length xPrevs
>
> x1Prevs = V.map x1 xPrevs
> x2Prevs = V.map x2 xPrevs
>
> deltaTs = V.replicate ix deltaT
> gs = V.replicate ix g
> x1s = x1Prevs .+ (x2Prevs .* deltaTs)
> x2s = x2Prevs . (gs .* (V.map sin x1Prevs) .* deltaTs)
and its noisy version.
> stateUpdateNoisy :: MonadRandom m =>
> Sym 2 >
> Particles (SystemState Double) >
> m (Particles (SystemState Double))
> stateUpdateNoisy bigQ xPrevs = do
> let xs = stateUpdate xPrevs
>
> x1s = V.map x1 xs
> x2s = V.map x2 xs
>
> let ix = V.length xPrevs
> etas < replicateM ix $ sample $ rvar (MultivariateNormal 0.0 bigQ)
>
> let eta1s, eta2s :: V.Vector Double
> eta1s = V.fromList $ map (fst . headTail) etas
> eta2s = V.fromList $ map (fst . headTail . snd . headTail) etas
>
> return (V.zipWith SystemState (x1s .+ eta1s) (x2s .+ eta2s))
The function which maps the state to the observable.
> obsUpdate :: Particles (SystemState Double) >
> Particles (SystemObs Double)
> obsUpdate xs = V.map (SystemObs . sin . x1) xs
And finally a function to calculate the weight of each particle given an observation.
> weight :: forall a n . KnownNat n =>
> (a > R n) >
> Sym n >
> a > a > Double
> weight f bigR obs obsNew = pdf (MultivariateNormal (f obsNew) bigR) (f obs)
The variance of the prior.
> bigP :: Sym 2
> bigP = sym $ diag 0.1
Generate our ensemble of particles chosen from the prior,
> initParticles :: MonadRandom m =>
> m (Particles (SystemState Double))
> initParticles = V.replicateM nParticles $ do
> r < sample $ rvar (MultivariateNormal m0' bigP)
> let x1 = fst $ headTail r
> x2 = fst $ headTail $ snd $ headTail r
> return $ SystemState { x1 = x1, x2 = x2}
run the particle filter,
> runFilter :: Int > [Particles (SystemState Double)]
> runFilter nTimeSteps = snd $ evalState action (pureMT 19)
> where
> action = runWriterT $ do
> xs < lift $ initParticles
> V.foldM
> (oneFilteringStep (stateUpdateNoisy bigQ') obsUpdate (weight f bigR'))
> xs
> (V.fromList $ map (SystemObs . fst . headTail . snd)
> (take nTimeSteps pendulumSamples'))
and extract the estimated position from the filter.
> testFiltering :: Int > [Double]
> testFiltering nTimeSteps = map ((/ (fromIntegral nParticles)). sum . V.map x1)
> (runFilter nTimeSteps)
Smoothing
If we could calculate the marginal smoothing distributions then we might be able to sample from them. Using the Markov assumption of our model that is independent of given , we have
We observe that this is a (continuous state space) Markov process with a nonhomogeneous transition function albeit one which goes backwards in time. Apparently for conditionally Gaussian linear statespace models, this is known as RTS, or RauchTungStriebel smoothing (Rauch, Striebel, and Tung (1965)).
According to Cappé (2008),

It appears to be efficient and stable in the long term (although no proof was available at the time the slides were presented).

It is not sequential (in particular, one needs to store all particle positions and weights).

It has numerical complexity proportional where is the number of particles.
We can use this to sample paths starting at time and working backwards. From above we have
where is some normalisation constant (Z for “Zustandssumme” – sum over states).
From particle filtering we know that
Thus
and we can sample from with probability
Recalling that we have resampled the particles in the filtering algorithm so that their weights are all and abstracting the state update and state density function, we can encode this update step in Haskell as
> oneSmoothingStep :: MonadRandom m =>
> (Particles a > V.Vector a) >
> (a > a > Double) >
> a >
> Particles a >
> WriterT (Particles a) m a
> oneSmoothingStep stateUpdate
> stateDensity
> smoothingSampleAtiPlus1
> filterSamplesAti = do it
> where
> it = do
> let mus = stateUpdate filterSamplesAti
> weights = V.map (stateDensity smoothingSampleAtiPlus1) mus
> cumSumWeights = V.tail $ V.scanl (+) 0 weights
> totWeight = V.last cumSumWeights
> v < lift $ sample $ uniform 0.0 totWeight
> let ix = binarySearch cumSumWeights v
> xnNew = filterSamplesAti V.! ix
> tell $ V.singleton xnNew
> return $ xnNew
To sample a complete path we start with a sample from the filtering distribution at at time (which is also the smoothing distribution)
> oneSmoothingPath :: MonadRandom m =>
> (Int > V.Vector (Particles a)) >
> (a > Particles a > WriterT (Particles a) m a) >
> Int > m (a, V.Vector a)
> oneSmoothingPath filterEstss oneSmoothingStep nTimeSteps = do
> let ys = filterEstss nTimeSteps
> ix < sample $ uniform 0 (nParticles  1)
> let xn = (V.head ys) V.! ix
> runWriterT $ V.foldM oneSmoothingStep xn (V.tail ys)
> oneSmoothingPath' :: (MonadRandom m, Show a) =>
> (Int > V.Vector (Particles a)) >
> (a > Particles a > WriterT (Particles a) m a) >
> Int > WriterT (Particles a) m a
> oneSmoothingPath' filterEstss oneSmoothingStep nTimeSteps = do
> let ys = filterEstss nTimeSteps
> ix < lift $ sample $ uniform 0 (nParticles  1)
> let xn = (V.head ys) V.! ix
> V.foldM oneSmoothingStep xn (V.tail ys)
Of course we need to run through the filtering distributions starting at
> filterEstss :: Int > V.Vector (Particles (SystemState Double))
> filterEstss n = V.reverse $ V.fromList $ runFilter n
> testSmoothing :: Int > Int > [Double]
> testSmoothing m n = V.toList $ evalState action (pureMT 23)
> where
> action = do
> xss < V.replicateM n $
> snd <$> (oneSmoothingPath filterEstss (oneSmoothingStep stateUpdate (weight h bigQ')) m)
> let yss = V.fromList $ map V.fromList $
> transpose $
> V.toList $ V.map (V.toList) $
> xss
> return $ V.map (/ (fromIntegral n)) $ V.map V.sum $ V.map (V.map x1) yss
By eye we can see we get a better fit
and calculating the mean square error for filtering gives against the mean square error for smoothing of ; this confirms the fit really is better as one would hope.
Unknown Gravity
Let us continue with the same example but now assume that is unknown and that we wish to estimate it. Let us also assume that our apparatus is not subject to noise.
Again we have
But now when we discretize it we include a third variable
where
Again we assume that we can only measure the horizontal position of the pendulum so that
where .
> type PendulumStateG = R 3
> pendulumSampleG :: MonadRandom m =>
> Sym 3 >
> Sym 1 >
> PendulumStateG >
> m (Maybe ((PendulumStateG, PendulumObs), PendulumStateG))
> pendulumSampleG bigQ bigR xPrev = do
> let x1Prev = fst $ headTail xPrev
> x2Prev = fst $ headTail $ snd $ headTail xPrev
> x3Prev = fst $ headTail $ snd $ headTail $ snd $ headTail xPrev
> eta < sample $ rvar (MultivariateNormal 0.0 bigQ)
> let x1= x1Prev + x2Prev * deltaT
> x2 = x2Prev  g * (sin x1Prev) * deltaT
> x3 = x3Prev
> xNew = vector [x1, x2, x3] + eta
> x1New = fst $ headTail xNew
> epsilon < sample $ rvar (MultivariateNormal 0.0 bigR)
> let yNew = vector [sin x1New] + epsilon
> return $ Just ((xNew, yNew), xNew)
> pendulumSampleGs :: [(PendulumStateG, PendulumObs)]
> pendulumSampleGs = evalState (ML.unfoldrM (pendulumSampleG bigQg bigRg) mG) (pureMT 29)
> data SystemStateG a = SystemStateG { gx1 :: a, gx2 :: a, gx3 :: a }
> deriving Show
The state update itself
> stateUpdateG :: Particles (SystemStateG Double) >
> Particles (SystemStateG Double)
> stateUpdateG xPrevs = V.zipWith3 SystemStateG x1s x2s x3s
> where
> ix = V.length xPrevs
>
> x1Prevs = V.map gx1 xPrevs
> x2Prevs = V.map gx2 xPrevs
> x3Prevs = V.map gx3 xPrevs
>
> deltaTs = V.replicate ix deltaT
> x1s = x1Prevs .+ (x2Prevs .* deltaTs)
> x2s = x2Prevs . (x3Prevs .* (V.map sin x1Prevs) .* deltaTs)
> x3s = x3Prevs
and its noisy version.
> stateUpdateNoisyG :: MonadRandom m =>
> Sym 3 >
> Particles (SystemStateG Double) >
> m (Particles (SystemStateG Double))
> stateUpdateNoisyG bigQ xPrevs = do
> let ix = V.length xPrevs
>
> let xs = stateUpdateG xPrevs
>
> x1s = V.map gx1 xs
> x2s = V.map gx2 xs
> x3s = V.map gx3 xs
>
> etas < replicateM ix $ sample $ rvar (MultivariateNormal 0.0 bigQ)
> let eta1s, eta2s, eta3s :: V.Vector Double
> eta1s = V.fromList $ map (fst . headTail) etas
> eta2s = V.fromList $ map (fst . headTail . snd . headTail) etas
> eta3s = V.fromList $ map (fst . headTail . snd . headTail . snd . headTail) etas
>
> return (V.zipWith3 SystemStateG (x1s .+ eta1s) (x2s .+ eta2s) (x3s .+ eta3s))
The function which maps the state to the observable.
> obsUpdateG :: Particles (SystemStateG Double) >
> Particles (SystemObs Double)
> obsUpdateG xs = V.map (SystemObs . sin . gx1) xs
The mean and variance of the prior.
> mG :: R 3
> mG = vector [1.6, 0.0, 8.00]
> bigPg :: Sym 3
> bigPg = sym $ matrix [
> 1e6, 0.0, 0.0
> , 0.0, 1e6, 0.0
> , 0.0, 0.0, 1e2
> ]
Parameters for the state update; note that the variance is not exactly the same as in the formulation above.
> bigQg :: Sym 3
> bigQg = sym $ matrix bigQgl
> qc1G :: Double
> qc1G = 0.0001
> sigmaG :: Double
> sigmaG = 1.0e2
> bigQgl :: [Double]
> bigQgl = [ qc1G * deltaT^3 / 3, qc1G * deltaT^2 / 2, 0.0,
> qc1G * deltaT^2 / 2, qc1G * deltaT, 0.0,
> 0.0, 0.0, sigmaG
> ]
The noise of the measurement.
> bigRg :: Sym 1
> bigRg = sym $ matrix [0.1]
Generate the ensemble of particles from the prior,
> initParticlesG :: MonadRandom m =>
> m (Particles (SystemStateG Double))
> initParticlesG = V.replicateM nParticles $ do
> r < sample $ rvar (MultivariateNormal mG bigPg)
> let x1 = fst $ headTail r
> x2 = fst $ headTail $ snd $ headTail r
> x3 = fst $ headTail $ snd $ headTail $ snd $ headTail r
> return $ SystemStateG { gx1 = x1, gx2 = x2, gx3 = x3}
run the particle filter,
> runFilterG :: Int > [Particles (SystemStateG Double)]
> runFilterG n = snd $ evalState action (pureMT 19)
> where
> action = runWriterT $ do
> xs < lift $ initParticlesG
> V.foldM
> (oneFilteringStep (stateUpdateNoisyG bigQg) obsUpdateG (weight f bigRg))
> xs
> (V.fromList $ map (SystemObs . fst . headTail . snd) (take n pendulumSampleGs))
and extract the estimated parameter from the filter.
> testFilteringG :: Int > [Double]
> testFilteringG n = map ((/ (fromIntegral nParticles)). sum . V.map gx3) (runFilterG n)
Again we need to run through the filtering distributions starting at
> filterGEstss :: Int > V.Vector (Particles (SystemStateG Double))
> filterGEstss n = V.reverse $ V.fromList $ runFilterG n
> testSmoothingG :: Int > Int > ([Double], [Double], [Double])
> testSmoothingG m n = (\(x, y, z) > (V.toList x, V.toList y, V.toList z)) $
> mkMeans $
> chunks
> where
>
> chunks =
> V.fromList $ map V.fromList $
> transpose $
> V.toList $ V.map (V.toList) $
> chunksOf m $
> snd $ evalState (runWriterT action) (pureMT 23)
>
> mkMeans yss = (
> V.map (/ (fromIntegral n)) $ V.map V.sum $ V.map (V.map gx1) yss,
> V.map (/ (fromIntegral n)) $ V.map V.sum $ V.map (V.map gx2) yss,
> V.map (/ (fromIntegral n)) $ V.map V.sum $ V.map (V.map gx3) yss
> )
>
> action =
> V.replicateM n $
> oneSmoothingPath' filterGEstss
> (oneSmoothingStep stateUpdateG (weight hG bigQg))
> m
Again by eye we get a better fit but note that we are using the samples in which the state update is noisy as well as the observation so we don’t expect to get a very good fit.
Notes
Helpers for Converting Types
> f :: SystemObs Double > R 1
> f = vector . pure . y1
> h :: SystemState Double > R 2
> h u = vector [x1 u , x2 u]
> hG :: SystemStateG Double > R 3
> hG u = vector [gx1 u , gx2 u, gx3 u]
Helpers for the Inverse CDF
That these are helpers for the inverse CDF is delayed to another blog post.
> indices :: V.Vector Double > V.Vector Double > V.Vector Int
> indices bs xs = V.map (binarySearch bs) xs
> binarySearch :: (Ord a) =>
> V.Vector a > a > Int
> binarySearch vec x = loop 0 (V.length vec  1)
> where
> loop !l !u
>  u <= l = l
>  otherwise = let e = vec V.! k in if x <= e then loop l k else loop (k+1) u
> where k = l + (u  l) `shiftR` 1
Vector Helpers
> chunksOf :: Int > V.Vector a > V.Vector (V.Vector a)
> chunksOf n xs = ys
> where
> l = V.length xs
> m = 1 + (l  1) `div` n
> ys = V.unfoldrN m (\us > Just (V.take n us, V.drop n us)) xs
Bibliography
Cappé, Olivier. 2008. “An Introduction to Sequential Monte Carlo for Filtering and Smoothing.” http://wwwirma.ustrasbg.fr/~guillou/meeting/cappe.pdf.
Rauch, H. E., C. T. Striebel, and F. Tung. 1965. “Maximum Likelihood Estimates of Linear Dynamic Systems.” Journal of the American Institute of Aeronautics and Astronautics 3 (8): 1445–50.
Särkkä, Simo. 2013. Bayesian Filtering and Smoothing. New York, NY, USA: Cambridge University Press.
Inferring Parameters for ODEs using Stan
Introduction
The equation of motion for a pendulum of unit length subject to Gaussian white noise is
We can discretize this via the usual Euler method
where and
The explanation of the precise form of the covariance matrix will be the subject of another blog post; for the purpose of exposition of using Stan and, in particular, Stan’s ability to handle ODEs, this detail is not important.
Instead of assuming that we know let us take it to be unknown and that we wish to infer its value using the pendulum as our experimental apparatus.
Stan is a probabilistic programming language which should be welll suited to perform such an inference. We use its interface via the R package rstan.
A Stan and R Implementation
Let’s generate some samples using Stan but rather than generating exactly the model we have given above, instead let’s solve the differential equation and then add some noise. Of course this won’t quite give us samples from the model the parameters of which we wish to estimate but it will allow us to use Stan’s ODE solving capability.
Here’s the Stan
functions {
real[] pendulum(real t,
real[] y,
real[] theta,
real[] x_r,
int[] x_i) {
real dydt[2];
dydt[1] < y[2];
dydt[2] <  theta[1] * sin(y[1]);
return dydt;
}
}
data {
int<lower=1> T;
real y0[2];
real t0;
real ts[T];
real theta[1];
real sigma[2];
}
transformed data {
real x_r[0];
int x_i[0];
}
model {
}
generated quantities {
real y_hat[T,2];
y_hat < integrate_ode(pendulum, y0, t0, ts, theta, x_r, x_i);
for (t in 1:T) {
y_hat[t,1] < y_hat[t,1] + normal_rng(0,sigma[1]);
y_hat[t,2] < y_hat[t,2] + normal_rng(0,sigma[2]);
}
}
And here’s the R to invoke it
library(rstan)
library(mvtnorm)
qc1 = 0.0001
deltaT = 0.01
nSamples = 100
m0 = c(1.6, 0)
g = 9.81
t0 = 0.0
ts = seq(deltaT,nSamples * deltaT,deltaT)
bigQ = matrix(c(qc1 * deltaT^3 / 3, qc1 * deltaT^2 / 2,
qc1 * deltaT^2 / 2, qc1 * deltaT
),
nrow = 2,
ncol = 2,
byrow = TRUE
)
samples < stan(file = 'Pendulum.stan',
data = list (
T = nSamples,
y0 = m0,
t0 = t0,
ts = ts,
theta = array(g, dim = 1),
sigma = c(bigQ[1,1], bigQ[2,2]),
refresh = 1
),
algorithm="Fixed_param",
seed = 42,
chains = 1,
iter =1
)
We can plot the angle the pendulum subtends to the vertical over time. Note that this is not very noisy.
s < extract(samples,permuted=FALSE)
plot(s[1,1,1:100])
Now let us suppose that we don’t know the value of and we can only observe the horizontal displacement.
zStan < sin(s[1,1,1:nSamples])
Now we can use Stan to infer the value of .
functions {
real[] pendulum(real t,
real[] y,
real[] theta,
real[] x_r,
int[] x_i) {
real dydt[2];
dydt[1] < y[2];
dydt[2] <  theta[1] * sin(y[1]);
return dydt;
}
}
data {
int<lower=1> T;
real y0[2];
real z[T];
real t0;
real ts[T];
}
transformed data {
real x_r[0];
int x_i[0];
}
parameters {
real theta[1];
vector<lower=0>[1] sigma;
}
model {
real y_hat[T,2];
real z_hat[T];
theta ~ normal(0,1);
sigma ~ cauchy(0,2.5);
y_hat < integrate_ode(pendulum, y0, t0, ts, theta, x_r, x_i);
for (t in 1:T) {
z_hat[t] < sin(y_hat[t,1]);
z[t] ~ normal(z_hat[t], sigma);
}
}
Here’s the R to invoke it.
estimates < stan(file = 'PendulumInfer.stan',
data = list (
T = nSamples,
y0 = m0,
z = zStan,
t0 = t0,
ts = ts
),
seed = 42,
chains = 1,
iter = 1000,
warmup = 500,
refresh = 1
)
e < extract(estimates,pars=c("theta[1]","sigma[1]","lp__"),permuted=TRUE)
This gives an estiamted valeu for of 9.809999 which is what we would hope.
Now let’s try adding some noise to our observations.
set.seed(42)
epsilons < rmvnorm(n=nSamples,mean=c(0.0),sigma=bigR)
zStanNoisy < sin(s[1,1,1:nSamples] + epsilons[,1])
estimatesNoisy < stan(file = 'PendulumInfer.stan',
data = list (T = nSamples,
y0 = m0,
z = zStanNoisy,
t0 = t0,
ts = ts
),
seed = 42,
chains = 1,
iter = 1000,
warmup = 500,
refresh = 1
)
eNoisy < extract(estimatesNoisy,pars=c("theta[1]","sigma[1]","lp__"),permuted=TRUE)
This gives an estiamted value for of 8.5871024 which is ok but not great.
Postamble
To build this page, download the relevant files from github and run this in R:
library(knitr)
knit('Pendulum.Rmd')
And this from command line:
pandoc s Pendulum.md filter=./Include > PendulumExpanded.html
Naive Particle Smoothing is Degenerate
Introduction
Let be a (hidden) Markov process. By hidden, we mean that we are not able to observe it.
And let be an observable Markov process such that
That is the observations are conditionally independent given the state of the hidden process.
As an example let us take the one given in Särkkä (2013) where the movement of a car is given by Newton’s laws of motion and the acceleration is modelled as white noise.
Although we do not do so here, and can be derived from the dynamics. For the purpose of this blog post, we note that they are given by
and
We wish to determine the position and velocity of the car given noisy observations of the position. In general we need the distribution of the hidden path given the observable path. We use the notation to mean the path of starting a and finishing at .
Haskell Preamble
> {# OPTIONS_GHC Wall #}
> {# OPTIONS_GHC fnowarnnameshadowing #}
> {# OPTIONS_GHC fnowarntypedefaults #}
> {# OPTIONS_GHC fnowarnunuseddobind #}
> {# OPTIONS_GHC fnowarnmissingmethods #}
> {# OPTIONS_GHC fnowarnorphans #}
> {# LANGUAGE FlexibleInstances #}
> {# LANGUAGE MultiParamTypeClasses #}
> {# LANGUAGE FlexibleContexts #}
> {# LANGUAGE TypeFamilies #}
> {# LANGUAGE BangPatterns #}
> {# LANGUAGE GeneralizedNewtypeDeriving #}
> {# LANGUAGE ScopedTypeVariables #}
> {# LANGUAGE TemplateHaskell #}
> module ParticleSmoothing
> ( simpleSamples
> , carSamples
> , testCar
> , testSimple
> ) where
> import Data.Random.Source.PureMT
> import Data.Random hiding ( StdNormal, Normal )
> import qualified Data.Random as R
> import Control.Monad.State
> import Control.Monad.Writer hiding ( Any, All )
> import qualified Numeric.LinearAlgebra.HMatrix as H
> import Foreign.Storable ( Storable )
> import Data.Maybe ( fromJust )
> import Data.Bits ( shiftR )
> import qualified Data.Vector as V
> import qualified Data.Vector.Unboxed as U
> import Control.Monad.ST
> import System.Random.MWC
> import Data.Array.Repa ( Z(..), (:.)(..), Any(..), computeP,
> extent, DIM1, DIM2, slice, All(..)
> )
> import qualified Data.Array.Repa as Repa
> import qualified Control.Monad.Loops as ML
> import PrettyPrint ()
> import Text.PrettyPrint.HughesPJClass ( Pretty, pPrint )
> import Data.Vector.Unboxed.Deriving
Some Theory
If we could sample then we could approximate the posterior as
If we wish to, we can create marginal estimates
When , this is the filtering estimate.
Standard Bayesian Recursion
Prediction
Update
where by definition
Path Space Recursion
We have
where by definition
Prediction
Update
Algorithm
The idea is to simulate paths using the recursion we derived above.
At time we have an approximating distribution
Sample and set . We then have an approximation of the prediction step
Substituting
and again
where and .
Now sample
A Haskell Implementation
Let’s specify some values for the example of the car moving in two dimensions.
> deltaT, sigma1, sigma2, qc1, qc2 :: Double
> deltaT = 0.1
> sigma1 = 1/2
> sigma2 = 1/2
> qc1 = 1
> qc2 = 1
> bigA :: H.Matrix Double
> bigA = (4 H.>< 4) bigAl
> bigAl :: [Double]
> bigAl = [1, 0 , deltaT, 0,
> 0, 1, 0, deltaT,
> 0, 0, 1, 0,
> 0, 0, 0, 1]
> bigQ :: H.Herm Double
> bigQ = H.trustSym $ (4 H.>< 4) bigQl
> bigQl :: [Double]
> bigQl = [qc1 * deltaT^3 / 3, 0, qc1 * deltaT^2 / 2, 0,
> 0, qc2 * deltaT^3 / 3, 0, qc2 * deltaT^2 / 2,
> qc1 * deltaT^2 / 2, 0, qc1 * deltaT, 0,
> 0, qc2 * deltaT^2 / 2, 0, qc2 * deltaT]
> bigH :: H.Matrix Double
> bigH = (2 H.>< 4) [1, 0, 0, 0,
> 0, 1, 0, 0]
> bigR :: H.Herm Double
> bigR = H.trustSym $ (2 H.>< 2) [sigma1^2, 0,
> 0, sigma2^2]
> m0 :: H.Vector Double
> m0 = H.fromList [0, 0, 1, 1]
> bigP0 :: H.Herm Double
> bigP0 = H.trustSym $ H.ident 4
> n :: Int
> n = 23
With these we generate hidden and observable sample path.
> carSample :: MonadRandom m =>
> H.Vector Double >
> m (Maybe ((H.Vector Double, H.Vector Double), H.Vector Double))
> carSample xPrev = do
> xNew < sample $ rvar (Normal (bigA H.#> xPrev) bigQ)
> yNew < sample $ rvar (Normal (bigH H.#> xNew) bigR)
> return $ Just ((xNew, yNew), xNew)
> carSamples :: [(H.Vector Double, H.Vector Double)]
> carSamples = evalState (ML.unfoldrM carSample m0) (pureMT 17)
We can plot an example trajectory for the car and the noisy observations that are available to the smoother / filter.
Sadly there is no equivalent to numpy in Haskell. Random number packages generate vectors, for multirank arrays there is repa and for fast matrix manipulation there is hmtatrix. Thus for our single step path update function, we have to pass in functions to perform type conversion. Clearly with all the copying inherent in this approach, performance is not going to be great.
The type synonym ArraySmoothing is used to denote the cloud of particles at each time step.
> type ArraySmoothing = Repa.Array Repa.U DIM2
> singleStep :: forall a . U.Unbox a =>
> (a > H.Vector Double) >
> (H.Vector Double > a) >
> H.Matrix Double >
> H.Herm Double >
> H.Matrix Double >
> H.Herm Double >
> ArraySmoothing a > H.Vector Double >
> WriterT [ArraySmoothing a] (StateT PureMT IO) (ArraySmoothing a)
> singleStep f g bigA bigQ bigH bigR x y = do
> tell[x]
> let (Z :. ix :. jx) = extent x
>
> xHatR < lift $ computeP $ Repa.slice x (Any :. jx  1)
> let xHatH = map f $ Repa.toList (xHatR :: Repa.Array Repa.U DIM1 a)
> xTildeNextH < lift $ mapM (\x > sample $ rvar (Normal (bigA H.#> x) bigQ)) xHatH
>
> let xTildeNextR = Repa.fromListUnboxed (Z :. ix :. (1 :: Int)) $
> map g xTildeNextH
> xTilde = Repa.append x xTildeNextR
>
> weights = map (normalPdf y bigR) $
> map (bigH H.#>) xTildeNextH
> vs = runST (create >>= (asGenST $ \gen > uniformVector gen n))
> cumSumWeights = V.scanl (+) 0 (V.fromList weights)
> totWeight = sum weights
> js = indices (V.map (/ totWeight) $ V.tail cumSumWeights) vs
> xNewV = V.map (\j > Repa.transpose $
> Repa.reshape (Z :. (1 :: Int) :. jx + 1) $
> slice xTilde (Any :. j :. All)) js
> xNewR = Repa.transpose $ V.foldr Repa.append (xNewV V.! 0) (V.tail xNewV)
> computeP xNewR
The state for the car is a 4tuple.
> data SystemState a = SystemState { xPos :: a
> , yPos :: a
> , _xSpd :: a
> , _ySpd :: a
> }
We initialize the smoother from some prior distribution.
> initCar :: StateT PureMT IO (ArraySmoothing (SystemState Double))
> initCar = do
> xTilde1 < replicateM n $ sample $ rvar (Normal m0 bigP0)
> let weights = map (normalPdf (snd $ head carSamples) bigR) $
> map (bigH H.#>) xTilde1
> vs = runST (create >>= (asGenST $ \gen > uniformVector gen n))
> cumSumWeights = V.scanl (+) 0 (V.fromList weights)
> js = indices (V.tail cumSumWeights) vs
> xHat1 = Repa.fromListUnboxed (Z :. n :. (1 :: Int)) $
> map ((\[a,b,c,d] > SystemState a b c d) . H.toList) $
> V.toList $
> V.map ((V.fromList xTilde1) V.!) js
> return xHat1
Now we can run the smoother.
> smootherCar :: StateT PureMT IO
> (ArraySmoothing (SystemState Double)
> , [ArraySmoothing (SystemState Double)])
> smootherCar = runWriterT $ do
> xHat1 < lift initCar
> foldM (singleStep f g bigA bigQ bigH bigR) xHat1 (take 100 $ map snd $ tail carSamples)
> f :: SystemState Double > H.Vector Double
> f (SystemState a b c d) = H.fromList [a, b, c, d]
> g :: H.Vector Double > SystemState Double
> g = (\[a,b,c,d] > (SystemState a b c d)) . H.toList
And create inferred positions for the car which we then plot.
> testCar :: IO ([Double], [Double])
> testCar = do
> states < snd <$> evalStateT smootherCar (pureMT 24)
> let xs :: [Repa.Array Repa.D DIM2 Double]
> xs = map (Repa.map xPos) states
> sumXs :: [Repa.Array Repa.U DIM1 Double] < mapM Repa.sumP (map Repa.transpose xs)
> let ixs = map extent sumXs
> sumLastXs = map (* (recip $ fromIntegral n)) $
> zipWith (Repa.!) sumXs (map (\(Z :. x) > Z :. (x  1)) ixs)
> let ys :: [Repa.Array Repa.D DIM2 Double]
> ys = map (Repa.map yPos) states
> sumYs :: [Repa.Array Repa.U DIM1 Double] < mapM Repa.sumP (map Repa.transpose ys)
> let ixsY = map extent sumYs
> sumLastYs = map (* (recip $ fromIntegral n)) $
> zipWith (Repa.!) sumYs (map (\(Z :. x) > Z :. (x  1)) ixsY)
> return (sumLastXs, sumLastYs)
So it seems our smoother does quite well at inferring the state at the latest observation, that is, when it is working as a filter. But what about estimates for earlier times? We should do better as we have observations in the past and in the future. Let’s try with a simpler example and a smaller number of particles.
First we create some samples for our simple 1 dimensional linear Gaussian model.
> bigA1, bigQ1, bigR1, bigH1 :: Double
> bigA1 = 0.5
> bigQ1 = 0.1
> bigR1 = 0.1
> bigH1 = 1.0
> simpleSample :: MonadRandom m =>
> Double >
> m (Maybe ((Double, Double), Double))
> simpleSample xPrev = do
> xNew < sample $ rvar (R.Normal (bigA1 * xPrev) bigQ1)
> yNew < sample $ rvar (R.Normal (bigH1 * xNew) bigR1)
> return $ Just ((xNew, yNew), xNew)
> simpleSamples :: [(Double, Double)]
> simpleSamples = evalState (ML.unfoldrM simpleSample 0.0) (pureMT 17)
Again create a prior.
> initSimple :: MonadRandom m => m (ArraySmoothing Double)
> initSimple = do
> let y = snd $ head simpleSamples
> xTilde1 < replicateM n $ sample $ rvar $ R.Normal y bigR1
> let weights = map (pdf (R.Normal y bigR1)) $
> map (bigH1 *) xTilde1
> totWeight = sum weights
> vs = runST (create >>= (asGenST $ \gen > uniformVector gen n))
> cumSumWeights = V.scanl (+) 0 (V.fromList $ map (/ totWeight) weights)
> js = indices (V.tail cumSumWeights) vs
> xHat1 = Repa.fromListUnboxed (Z :. n :. (1 :: Int)) $
> V.toList $
> V.map ((V.fromList xTilde1) V.!) js
> return xHat1
Now we can run the smoother.
> smootherSimple :: StateT PureMT IO (ArraySmoothing Double, [ArraySmoothing Double])
> smootherSimple = runWriterT $ do
> xHat1 < lift initSimple
> foldM (singleStep f1 g1 ((1 H.>< 1) [bigA1]) (H.trustSym $ (1 H.>< 1) [bigQ1^2])
> ((1 H.>< 1) [bigH1]) (H.trustSym $ (1 H.>< 1) [bigR1^2]))
> xHat1
> (take 20 $ map H.fromList $ map return . map snd $ tail simpleSamples)
> f1 :: Double > H.Vector Double
> f1 a = H.fromList [a]
> g1 :: H.Vector Double > Double
> g1 = (\[a] > a) . H.toList
And finally we can look at the paths not just the means of the marginal distributions at the latest observation time.
> testSimple :: IO [[Double]]
> testSimple = do
> states < snd <$> evalStateT smootherSimple (pureMT 24)
> let path :: Int > IO (Repa.Array Repa.U DIM1 Double)
> path i = computeP $ Repa.slice (last states) (Any :. i :. All)
> paths < mapM path [0..n  1]
> return $ map Repa.toList paths
We can see that at some point in the past all the current particles have one ancestor. The marginals of the smoothing distribution (at some point in the past) have collapsed on to one particle.
Notes
Helpers for the Inverse CDF
That these are helpers for the inverse CDF is delayed to another blog post.
> indices :: V.Vector Double > V.Vector Double > V.Vector Int
> indices bs xs = V.map (binarySearch bs) xs
> binarySearch :: Ord a =>
> V.Vector a > a > Int
> binarySearch vec x = loop 0 (V.length vec  1)
> where
> loop !l !u
>  u <= l = l
>  otherwise = let e = vec V.! k in if x <= e then loop l k else loop (k+1) u
> where k = l + (u  l) `shiftR` 1
Multivariate Normal
The randomfu package does not contain a sampler or pdf for a multivariate normal so we create our own. This should be added to randomfumultivariate package or something similar.
> normalMultivariate :: H.Vector Double > H.Herm Double > RVarT m (H.Vector Double)
> normalMultivariate mu bigSigma = do
> z < replicateM (H.size mu) (rvarT R.StdNormal)
> return $ mu + bigA H.#> (H.fromList z)
> where
> (vals, bigU) = H.eigSH bigSigma
> lSqrt = H.diag $ H.cmap sqrt vals
> bigA = bigU H.<> lSqrt
> data family Normal k :: *
> data instance Normal (H.Vector Double) = Normal (H.Vector Double) (H.Herm Double)
> instance Distribution Normal (H.Vector Double) where
> rvar (Normal m s) = normalMultivariate m s
> normalPdf :: (H.Numeric a, H.Field a, H.Indexable (H.Vector a) a, Num (H.Vector a)) =>
> H.Vector a > H.Herm a > H.Vector a > a
> normalPdf mu sigma x = exp $ normalLogPdf mu sigma x
> normalLogPdf :: (H.Numeric a, H.Field a, H.Indexable (H.Vector a) a, Num (H.Vector a)) =>
> H.Vector a > H.Herm a > H.Vector a > a
> normalLogPdf mu bigSigma x =  H.sumElements (H.cmap log (diagonals dec))
>  0.5 * (fromIntegral (H.size mu)) * log (2 * pi)
>  0.5 * s
> where
> dec = fromJust $ H.mbChol bigSigma
> t = fromJust $ H.linearSolve (H.tr dec) (H.asColumn $ x  mu)
> u = H.cmap (\x > x * x) t
> s = H.sumElements u
> diagonals :: (Storable a, H.Element t, H.Indexable (H.Vector t) a) =>
> H.Matrix t > H.Vector a
> diagonals m = H.fromList (map (\i > m H.! i H.! i) [0..n1])
> where
> n = max (H.rows m) (H.cols m)
> instance PDF Normal (H.Vector Double) where
> pdf (Normal m s) = normalPdf m s
> logPdf (Normal m s) = normalLogPdf m s
Misellaneous
> derivingUnbox "SystemState"
> [t forall a . (U.Unbox a) => SystemState a > (a, a, a, a) ]
> [ \ (SystemState x y xdot ydot) > (x, y, xdot, ydot) ]
> [ \ (x, y, xdot, ydot) > SystemState x y xdot ydot ]
> instance Pretty a => Pretty (SystemState a) where
> pPrint (SystemState x y xdot ydot ) = pPrint (x, y, xdot, ydot)
Bibliography
Särkkä, Simo. 2013. Bayesian Filtering and Smoothing. New York, NY, USA: Cambridge University Press.
Population Growth Estimation via Hamiltonian Monte Carlo
Here’s the same analysis of estimating population growth using Stan.
data {
int<lower=0> N; // number of observations
vector[N] y; // observed population
}
parameters {
real r;
}
model {
real k;
real p0;
real deltaT;
real sigma;
real mu0;
real sigma0;
vector[N] p;
k < 1.0;
p0 < 0.1;
deltaT < 0.0005;
sigma < 0.01;
mu0 < 5;
sigma0 < 10;
r ~ normal(mu0, sigma0);
for (n in 1:N) {
p[n] < k * p0 * exp((n  1) * r * deltaT) / (k + p0 * (exp((n  1) * r * deltaT)  1));
y[n] ~ normal(p[n], sigma);
}
}
Empirically, by looking at the posterior, this seems to do a better job than either extended Kalman or vanilla Metropolis.