# Introduction

It’s possible to Gibbs sampling in most languages and since I am doing some work in R and some work in Haskell, I thought I’d present a simple example in both languages: estimating the mean from a normal distribution with unknown mean and variance. Although one can do Gibbs sampling directly in R, it is more common to use a specialised language such as JAGS or STAN to do the actual sampling and do pre-processing and post-processing in R. This blog post presents implementations in native R, JAGS and STAN as well as Haskell.

## Preamble

```
> {-# OPTIONS_GHC -Wall #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans #-}
```

```
> {-# LANGUAGE NoMonomorphismRestriction #-}
```

```
> module Gibbs (
> main
> , m
> , Moments(..)
> ) where
>
> import qualified Data.Vector.Unboxed as V
> import qualified Control.Monad.Loops as ML
> import Data.Random.Source.PureMT
> import Data.Random
> import Control.Monad.State
> import Data.Histogram ( asList )
> import Data.Histogram.Fill
> import Data.Histogram.Generic ( Histogram )
> import Data.List
> import qualified Control.Foldl as L
>
> import Diagrams.Backend.Cairo.CmdLine
>
> import LinRegAux
>
> import Diagrams.Backend.CmdLine
> import Diagrams.Prelude hiding ( sample, render )
```

The length of our chain and the burn-in.

```
> nrep, nb :: Int
> nb = 5000
> nrep = 105000
```

Data generated from .

```
> xs :: [Double]
> xs = [
> 11.0765808082301
> , 10.918739177542
> , 15.4302462747137
> , 10.1435649220266
> , 15.2112705014697
> , 10.441327659703
> , 2.95784054883142
> , 10.2761068139607
> , 9.64347295100318
> , 11.8043359297675
> , 10.9419989262713
> , 7.21905367667346
> , 10.4339807638017
> , 6.79485294803006
> , 11.817248658832
> , 6.6126710570584
> , 12.6640920214508
> , 8.36604701073303
> , 12.6048485320333
> , 8.43143879537592
> ]
```

# A Bit of Theory

## Gibbs Sampling

For a multi-parameter situation, Gibbs sampling is a special case of Metropolis-Hastings in which the proposal distributions are the posterior conditional distributions.

Referring back to the explanation of the metropolis algorithm, let us describe the state by its parameters and the conditional posteriors by where then

where we have used the rules of conditional probability and the fact that

Thus we always accept the proposed jump. Note that the chain is not in general reversible as the order in which the updates are done matters.

## Normal Distribution with Unknown Mean and Variance

It is fairly standard to use an improper prior

The likelihood is

re-writing in terms of precision

Thus the posterior is

We can re-write the sum in terms of the sample mean and variance using

Thus the conditional posterior for is

which we recognise as a normal distribution with mean of and a variance of .

The conditional posterior for is

which we recognise as a gamma distribution with a shape of and a scale of

In this particular case, we can calculate the marginal posterior of analytically. Writing we have

Finally we can calculate

This is the non-standardized Student’s t-distribution .

Alternatively the marginal posterior of is

where is the standard t distribution with degrees of freedom.

# The Model in Haskell

Following up on a comment from a previous blog post, let us try using the foldl package to calculate the length, the sum and the sum of squares traversing the list only once. An improvement on creating your own strict record and using *foldl’* but maybe it is not suitable for some methods e.g. calculating the skewness and kurtosis incrementally, see below.

```
> x2Sum, xSum, n :: Double
> (x2Sum, xSum, n) = L.fold stats xs
> where
> stats = (,,) <$>
> (L.premap (\x -> x * x) L.sum) <*>
> L.sum <*>
> L.genericLength
```

And re-writing the sample variance using

we can then calculate the sample mean and variance using the sums we have just calculated.

```
> xBar, varX :: Double
> xBar = xSum / n
> varX = n * (m2Xs - xBar * xBar) / (n - 1)
> where m2Xs = x2Sum / n
```

In random-fu, the Gamma distribution is specified by the rate paratmeter, .

```
> beta, initTau :: Double
> beta = 0.5 * n * varX
> initTau = evalState (sample (Gamma (n / 2) beta)) (pureMT 1)
```

Our sampler takes an old value of and creates new values of and .

```
> gibbsSampler :: MonadRandom m => Double -> m (Maybe ((Double, Double), Double))
> gibbsSampler oldTau = do
> newMu <- sample (Normal xBar (recip (sqrt (n * oldTau))))
> let shape = 0.5 * n
> scale = 0.5 * (x2Sum + n * newMu^2 - 2 * n * newMu * xBar)
> newTau <- sample (Gamma shape (recip scale))
> return $ Just ((newMu, newTau), newTau)
```

From which we can create an infinite stream of samples.

```
> gibbsSamples :: [(Double, Double)]
> gibbsSamples = evalState (ML.unfoldrM gibbsSampler initTau) (pureMT 1)
```

As our chains might be very long, we calculate the mean, variance, skewness and kurtosis using an incremental method.

```
> data Moments = Moments { mN :: !Double
> , m1 :: !Double
> , m2 :: !Double
> , m3 :: !Double
> , m4 :: !Double
> }
> deriving Show
```

```
> moments :: [Double] -> Moments
> moments xs = foldl' f (Moments 0.0 0.0 0.0 0.0 0.0) xs
> where
> f :: Moments -> Double -> Moments
> f m x = Moments n' m1' m2' m3' m4'
> where
> n = mN m
> n' = n + 1
> delta = x - (m1 m)
> delta_n = delta / n'
> delta_n2 = delta_n * delta_n
> term1 = delta * delta_n * n
> m1' = m1 m + delta_n
> m4' = m4 m +
> term1 * delta_n2 * (n'*n' - 3*n' + 3) +
> 6 * delta_n2 * m2 m - 4 * delta_n * m3 m
> m3' = m3 m + term1 * delta_n * (n' - 2) - 3 * delta_n * m2 m
> m2' = m2 m + term1
```

In order to examine the posterior, we create a histogram.

```
> numBins :: Int
> numBins = 400
```

```
> hb :: HBuilder Double (Data.Histogram.Generic.Histogram V.Vector BinD Double)
> hb = forceDouble -<< mkSimple (binD lower numBins upper)
> where
> lower = xBar - 2.0 * sqrt varX
> upper = xBar + 2.0 * sqrt varX
```

And fill it with the specified number of samples preceeded by a burn-in.

```
> hist :: Histogram V.Vector BinD Double
> hist = fillBuilder hb (take (nrep - nb) $ drop nb $ map fst gibbsSamples)
```

Now we can plot this.

And calculate the skewness and kurtosis.

```
> m :: Moments
> m = moments (take (nrep - nb) $ drop nb $ map fst gibbsSamples)
```

```
ghci> import Gibbs
ghci> putStrLn $ show $ (sqrt (mN m)) * (m3 m) / (m2 m)**1.5
8.733959917065126e-4
ghci> putStrLn $ show $ (mN m) * (m4 m) / (m2 m)**2
3.451374739494607
```

We expect a skewness of 0 and a kurtosis of for . Not too bad.

# The Model in JAGS

JAGS is a mature, declarative, domain specific language for building Bayesian statistical models using Gibbs sampling.

Here is our model as expressed in JAGS. Somewhat terse.

```
model {
for (i in 1:N) {
x[i] ~ dnorm(mu, tau)
}
mu ~ dnorm(0, 1.0E-6)
tau <- pow(sigma, -2)
sigma ~ dunif(0, 1000)
}
```

To run it and examine its results, we wrap it up in some R

```
## Import the library that allows R to inter-work with jags.
library(rjags)
## Read the simulated data into a data frame.
fn <- read.table("example1.data", header=FALSE)
jags <- jags.model('example1.bug',
data = list('x' = fn[,1], 'N' = 20),
n.chains = 4,
n.adapt = 100)
## Burnin for 10000 samples
update(jags, 10000);
mcmc_samples <- coda.samples(jags, variable.names=c("mu", "sigma"), n.iter=20000)
png(file="diagrams/jags.png",width=400,height=350)
plot(mcmc_samples)
dev.off()
```

And now we can look at the posterior for .

# The Model in STAN

STAN is a domain specific language for building Bayesian statistical models similar to JAGS but newer and which allows variables to be re-assigned and so cannot really be described as declarative.

Here is our model as expressed in STAN. Again, somewhat terse.

```
data {
int<lower=0> N;
real x[N];
}
parameters {
real mu;
real<lower=0,upper=1000> sigma;
}
model{
x ~ normal(mu, sigma);
mu ~ normal(0, 1000);
}
```

Just as with JAGS, to run it and examine its results, we wrap it up in some R.

```
library(rstan)
## Read the simulated data into a data frame.
fn <- read.table("example1.data", header=FALSE)
## Running the model
fit1 <- stan(file = 'Stan.stan',
data = list('x' = fn[,1], 'N' = 20),
pars=c("mu", "sigma"),
chains=3,
iter=30000,
warmup=10000)
png(file="diagrams/stan.png",width=400,height=350)
plot(fit1)
dev.off()
```

Again we can look at the posterior although we only seem to get medians and 80% intervals.

# PostAmble

Write the histogram produced by the Haskell code to a file.

```
> displayHeader :: FilePath -> Diagram B R2 -> IO ()
> displayHeader fn =
> mainRender ( DiagramOpts (Just 900) (Just 700) fn
> , DiagramLoopOpts False Nothing 0
> )
```

```
> main :: IO ()
> main = do
> displayHeader "diagrams/DataScienceHaskPost.png"
> (barDiag
> (zip (map fst $ asList hist) (map snd $ asList hist)))
```

The code can be downloaded from github.

Oops I forgot to include the native R implementation. Well who needs it when you have Haskell, JAGS and STAN.

I tried to get a composable version that traverses the list only once to work but I didn’t have any success. There are a lot of data dependencies which makes things tricky. I did, however, prompt me to fix a bug in Edward Kmett’s folds package, so that’s something.

Pingback: Gibbs Sampling in Haskell | Idontgetoutmuch’s Weblog

Nice post, Dominic. Your STAN code doesn’t do Gibbs sampling but Hamiltonian Monte Carlo (and more specifically its NUTS extension) does it?

Correct.

I can not understand that when there is no conjugate distribution or not easy to get conditional distribution, how Stan works for that?

Do you mean for Hamiltonian Monte Carlo? I’d recommend reading this tutorial https://arxiv.org/pdf/1206.1901.pdf