# Stochastic Systems, Fokker-Planck and the Heat Equation via Haskell

This is a short example of taking a stochastic system, using Fokker-Planck to convert it to a PDES and using SUNDIALS to solve a 2D partial differential equation in Haskell via the hmatrix-sundials library. The example is taken from the C examples that come with the SUNDIALS source. Here’s the full blog.

# Workshop on Numerical Programming in Functional Languages (NPFL)

I’m the chair this year for the first(!) ACM SIGPLAN International
Workshop on Numerical Programming in Functional Languages (NPFL),
which will be co-located with ICFP this September in St. Louis,
Missouri, USA.

Please consider submitting something! All you have to do is submit
between half a page and a page describing your talk. There will be no
proceedings so all you need do is prepare your slides or demo. We do
plan to video the talks for the benefit of the wider world.

We have Eva Darulova giving the keynote.

The submission deadline is Friday 13th July but I and the Committee
would love to see your proposals earlier.

questions!

# Introduction

These are some very hasty notes on Runge-Kutta methods and IRK2 in particular. I make no apologies for missing lots of details. I may try and put these in a more digestible form but not today.

# Some Uncomprehensive Theory

In general, an implicit Runge-Kutta method is given by

$\displaystyle y_{n+1} = y_n + h \sum_{i=1}^s b_i k_i$

where

$\displaystyle k_i = f\left( t_n + c_i h,\ y_{n} + h \sum_{j=1}^s a_{ij} k_j \right), \quad i = 1, \ldots, s$

and

$\displaystyle \sum_{j=1}^{i-1} a_{ij} = c_i \text{ for } i=2, \ldots, s$

Traditionally this is written as a Butcher tableau:

$\displaystyle \begin{array}{c|cccc} c_1 & a_{11} & a_{12}& \dots & a_{1s}\\ c_2 & a_{21} & a_{22}& \dots & a_{2s}\\ \vdots & \vdots & \vdots& \ddots& \vdots\\ c_s & a_{s1} & a_{s2}& \dots & a_{ss} \\ \hline & b_1 & b_2 & \dots & b_s\\ & b^*_1 & b^*_2 & \dots & b^*_s\\ \end{array}$

and even more succintly as

$\displaystyle \begin{array}{c|c} \mathbf{c}& A\\ \hline & \mathbf{b^T} \\ \end{array}$

For a Gauß-Legendre method we set the values of the $c_i$ to the zeros of the shifted Legendre polynomials.

An explicit expression for the shifted Legendre polynomials is given by

$\displaystyle \tilde{P}_n(x) = (-1)^n \sum_{k=0}^n \binom{n}{k} \binom{n+k}{k} (-x)^k$

The first few shifted Legendre polynomials are:

$\displaystyle \begin{array}{r|r} n & \tilde{P}_n(x) \\ \hline 0 & 1 \\ 1 & 2x-1 \\ 2 & 6x^2-6x+1 \\ 3 & 20x^3-30x^2+12x-1 \\ 4 & 70x^4-140x^3+90x^2-20x+1 \\ 5 & 252x^5 -630x^4 +560x^3 - 210 x^2 + 30 x -1 \end{array}$

Setting

$\displaystyle q(t) \triangleq \prod_{j=0}^s (t - c_j) \quad {\text and} \quad q_l \triangleq \frac{q(t)}{t - c_l}, \, l = 1,2, \ldots, s$

then the co-location method gives

$\displaystyle a_{j,i} \triangleq \int_0^{c_j} \frac{q_i(\tau)}{q_i(c_i)}\,{\mathrm d}\tau$
$\displaystyle b_{j} \triangleq \int_0^{1} \frac{q_i(\tau)}{q_i(c_i)}\,{\mathrm d}\tau$

For $s = 1$ we have $2x - 1 = 0$ and thus $c_1 = 1/2$ and the Butcher tableau is

$\displaystyle \begin{array}{c|c} 1/2 & 1/2\\ \hline & 1 \\ \end{array}$

that is, the implicit RK2 method aka the mid-point method.

For $s = 2$ we have $6x^2-6x+1 = 0$ and thus $c_1 = \frac{1}{2} - \frac{1}{2\sqrt{3}}$ and $c_2 = \frac{1}{2} + \frac{1}{2\sqrt{3}}$ and the Butcher tableau is

$\displaystyle \begin{array}{c|cc} \frac{1}{2} - \frac{1}{2\sqrt{3}} & \frac{1}{4} & \frac{1}{4} - \frac{1}{2\sqrt{3}}\\ \frac{1}{2} + \frac{1}{2\sqrt{3}} & \frac{1}{4} + \frac{1}{2\sqrt{3}} & \frac{1}{4}\\ \hline & \frac{1}{2} & \frac{1}{2}\\ \end{array}$

that is, the implicit RK4 method.

# Implicit RK2

Explicitly

$\displaystyle y_{n+1} = y_n + h b_1 k_1$

where

$\displaystyle k_1 = f\left( t_n + c_1 h,\ y_{n} + h a_{11} k_1 \right)$

Substituting in the values from the tableau, we have

$\displaystyle y_{n+1} = y_n + h k_1$

where

$\displaystyle k_1 = f\left( t_n + \frac{1}{2} h,\ y_{n} + h \frac{1}{2} k_1 \right)$

and further inlining and substitution gives

$\displaystyle y_{n+1}=y_n+hf(t+\frac{h}{2},\frac{1}{2}(y_n+y_{n+1}))$

which can be recognised as the mid-point method.

# Implementation

A common package for solving ODEs is gsl which Haskell interfaces via the hmatrix-gsl package. Some of gsl is coded with the conditional compilation flag DEBUG e.g. in msbdf.c but sadly not in the simpler methods maybe because they were made part of the package some years earlier. We can add our own of course; here’s a link for reference.

Let’s see how the Implicit Runge-Kutta Order 2 method does on the following system taken from the gsl documenation

$\displaystyle \frac{{\mathrm d}^2u}{{\mathrm d} t^2} + \mu \frac{{\mathrm d}^u}{{\mathrm d} t} (u^2 - 1) + u = 0$

which can be re-written as

\displaystyle \begin{aligned} \frac{{\mathrm d}y_0}{{\mathrm d} t} &= y_1 \\ \frac{{\mathrm d}y_1}{{\mathrm d} t} &= -y_0 - \mu y_1 (y_0 y_0 - 1) \end{aligned}

but replacing gsl_odeiv2_step_rk8pd with gsl_odeiv2_step_rk2imp.

Here’s the first few steps

rk2imp_apply: t=0.00000e+00, h=1.00000e-06, y:1.00000e+00 0.00000e+00
-- evaluate jacobian
(  2,  2)[0,0]:                      0
(  2,  2)[0,1]:                      1
(  2,  2)[1,0]:                     -1
(  2,  2)[1,1]:                     -0
YZ:1.00000e+00 -5.00000e-07
-- error estimates: 0.00000e+00 8.35739e-20
rk2imp_apply: t=1.00000e-06, h=5.00000e-06, y:1.00000e+00 -1.00000e-06
-- evaluate jacobian
(  2,  2)[0,0]:                      0
(  2,  2)[0,1]:                      1
(  2,  2)[1,0]:   -0.99997999999999998
(  2,  2)[1,1]: 1.00008890058234101e-11
YZ:1.00000e+00 -3.50000e-06
-- error estimates: 1.48030e-16 1.04162e-17
rk2imp_apply: t=6.00000e-06, h=2.50000e-05, y:1.00000e+00 -6.00000e-06
-- evaluate jacobian
(  2,  2)[0,0]:                      0
(  2,  2)[0,1]:                      1
(  2,  2)[1,0]:  -0.999880000000002878
(  2,  2)[1,1]: 3.59998697518904009e-10
YZ:1.00000e+00 -1.85000e-05
-- error estimates: 1.48030e-16 1.30208e-15
rk2imp_apply: t=3.10000e-05, h=1.25000e-04, y:1.00000e+00 -3.10000e-05
-- evaluate jacobian
(  2,  2)[0,0]:                      0
(  2,  2)[0,1]:                      1
(  2,  2)[1,0]:  -0.999380000000403723
(  2,  2)[1,1]: 9.6099972424212865e-09
YZ:1.00000e+00 -9.35000e-05
-- error estimates: 0.00000e+00 1.62760e-13
rk2imp_apply: t=1.56000e-04, h=6.25000e-04, y:1.00000e+00 -1.56000e-04
-- evaluate jacobian
(  2,  2)[0,0]:                      0
(  2,  2)[0,1]:                      1
(  2,  2)[1,0]:  -0.996880000051409643
(  2,  2)[1,1]: 2.4335999548874554e-07
YZ:1.00000e+00 -4.68500e-04
-- error estimates: 1.55431e-14 2.03450e-11

Let’s see if we can reproduce this in a fast and loose way in Haskell

To make our code easier to write and read let’s lift some arithmetic operators to act on lists (we should really use the Naperian package).

> module RK2Imp where
> instance Num a => Num [a] where
>   (+) = zipWith (+)
>   (*) = zipWith (*)

The actual algorithm is almost trivial

> rk2Step :: Fractional a => a -> a -> [a] -> [a] -> [a]
> rk2Step h t y_n0 y_n1 = y_n0 + (repeat h) * f (t + 0.5 * h) ((repeat 0.5) * (y_n0 + y_n1))

The example Van der Pol oscillator with the same parameter and initial conditions as in the gsl example.

> f :: Fractional b => a -> [b] -> [b]
> f t [y0, y1] = [y1, -y0 - mu * y1 * (y0 * y0 - 1)]
>   where
>     mu = 10.0
> y_init :: [Double]
> y_init = [1.0, 0.0]

We can use co-recursion to find the fixed point of the Runge-Kutta equation arbitrarily choosing the 10th iteration!

> y_1s, y_2s, y_3s :: [[Double]]
> y_1s = y_init : map (rk2Step 1.0e-6 0.0 y_init) y_1s
> nC :: Int
> nC = 10
> y_1, y_2, y_3 :: [Double]
> y_1 = y_1s !! nC

This gives

ghci> y_1
[0.9999999999995,-9.999999999997499e-7]

which is not too far from the value given by gsl.

y:1.00000e+00 -1.00000e-06

Getting the next time and step size from the gsl DEBUG information we can continue

> y_2s = y_1 : map (rk2Step 5.0e-6 1.0e-6 y_1) y_2s
>
> y_2 = y_2s !! nC
ghci> y_2
[0.999999999982,-5.999999999953503e-6]

gsl gives

y:1.00000e+00 -6.00000e-06
> y_3s = y_2 : map (rk2Step 2.5e-5 6.0e-6 y_2) y_3s
> y_3 = y_3s !! nC

One more time

ghci> y_3
[0.9999999995194999,-3.099999999372456e-5]

And gsl gives

y:1.00000e+00 -3.10000e-05

# Nix

The nix-shell which to run this is here and the C code can be built using something like

gcc ode.c -lgsl -lgslcblas -o ode

(in the nix shell).

The Haskell code can be used inside a ghci session.

# Introduction

## Summary

Back in January, a colleague pointed out to me that GHC did not produce very efficient code for performing floating point abs. I have yet to produce a write-up of my notes about hacking on GHC: in summary it wasn’t as difficult as I had feared and the #ghc folks were extremely helpful.

But maybe getting GHC to produce high performance numerical code is “swimming uphill”. Below is a comparison of a “state of the art” numerical language, Julia, and an alternative Haskell approach, using a Haskell domain-specific embedded language (DSEL), accelerate. The problem is a real one albeit quite simple from a programming point of view and should therefore be taken with some grains of salt.

## A bit more background

The reason for improving GHC’s (numerical) performance is that I’d like to have a fast way of performing statistical inference either via Hamiltonian Monte Carlo or Sequential Monte Carlo. Stan is my “go to” tool for offline analysis but its inference engine is fixed and according to Fast Forward Labs: “it’s very awkward to productize”. What one would really like is the ability to choose and compose inference methods like can be done in monad-bayes. Although I haven’t compared monad-bayes and Stan in very much depth, the latter seems faster in my very limited experience.

The view of one of the folks that’s worked hard on making Haskell a great tool for doing numerical applications is that trying to get GHC to produce llvm on which tools like polly can work is “swimming up hill”.

A lot of people who work on numerical problems use matlab but it seems like if you want to put the code into production then you have to re-write it. Julia is an attempt to improve this situation and also provide higher performance.

## A sort of manifesto

To summarise: what we’d like is type-safe blazingly fast numerical code. Type-safe can give us e.g. static guarantees that matrix multiplication is between compatible matrices and assurances that the units are correct. Here’s an example of static typing helping ensure the correctness of Kalman filter usage and here’s a package that I have used successfully on a medium-sized project to ensure all units are correct.

# Symplectic Integrators

Let’s take a simple solver, encode it in Haskell and Julia and see how well we can do.

> {-# OPTIONS_GHC -Wall                   #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults #-}
>
> {-# LANGUAGE FlexibleContexts #-}
> {-# LANGUAGE TypeOperators    #-}
> module Symplectic (
>     runSteps
>   , runSteps'
>   , reallyRunSteps'
>   , inits
>   , h
>   , bigH1
>   , hs
>   , nabla1
>   , nabla2
>   , runStepsH98
>   , bigH2BodyH98
>   ) where
>
> import Data.Number.Symbolic
> import Prelude                            as P
> import Data.Array.Accelerate              as A   hiding ((^))
> import Data.Array.Accelerate.LLVM.Native  as CPU
> import Data.Array.Accelerate.Linear              hiding (trace)
> import Data.Array.Accelerate.Control.Lens
> import qualified Linear                   as L

The Störmer-Verlet scheme is an implicit symplectic method of order 2. Being symplectic is important as it preserves the energy of the systems being solved. If, for example, we were to use RK4 to simulate planetary motion then the planets would either crash into the sun or leave the solar system: not a very accurate representation of reality.

\displaystyle \begin{aligned} p_{n+1 / 2} &= p_n - \frac{h}{2}\nabla_q H(p_{n+1 / 2}, q_n) \\ q_{n+1} &= q_n + \frac{h}{2}(\nabla_p H(p_{n+1 / 2}, q_n) + \nabla_p H(p_{n+1 / 2}, q_{n+1}) \\ p_{n+1} &= p_{n+1 / 2} - \frac{h}{2}\nabla_q H(p_{n+1 / 2}, q_{n+1}) \end{aligned}

Let’s assume that the Hamiltonian is of the form $H(p, q) = T(q) + V(p)$ then this becomes an explicit scheme.

\displaystyle \begin{aligned} p_{n+1 / 2} &= p_n - \frac{h}{2}\nabla_q T(q_n) \\ q_{n+1} &= q_n + \frac{h}{2}(\nabla_p V(p_{n+1 / 2}) + \nabla_q V(p_{n+1 / 2})) \\ p_{n+1} &= p_{n+1 / 2} - \frac{h}{2}\nabla_q T(q_{n+1}) \end{aligned}

## Simple Pendulum

Consider the following Hamiltonian for the pendulum:

$\displaystyle H(q,p) = \frac{1}{2}p^2 - \cos q$

> bigH1 :: P.Floating a => a -> a -> a
> bigH1 q p = p^2 / 2 - cos q
> ham1 :: P.Floating a => [a] -> a
> ham1 [qq1, pp1] = 0.5 * pp1^2 - cos qq1
> ham1 _ = error "Hamiltonian defined for 2 generalized co-ordinates only"

Although it is trivial to find the derivatives in this case, let us check using automatic symbolic differentiation

> q1, q2, p1, p2 :: Sym a
> q1 = var "q1"
> q2 = var "q2"
> p1 = var "p1"
> p2 = var "p2"
> nabla1 :: [[Sym Double]]
> nabla1 = jacobian ((\x -> [x]) . ham1) [q1, p1]
ghci> nabla1
[[sin q1,0.5*p1+0.5*p1]]

which after a bit of simplification gives us $\nabla_q T(q) = \sin q$ and $\nabla_p V(p) = p$ or

> nablaQ, nablaP :: P.Floating a => a -> a
> nablaQ = sin
> nablaP = id

One step of the Störmer-Verlet

> oneStep :: P.Floating a => (a -> a) ->
>                                  (a -> a) ->
>                                  a ->
>                                  (a, a) ->
>                                  (a, a)
> oneStep nabalQQ nablaPP hh (qPrev, pPrev) = (qNew, pNew)
>   where
>     h2 = hh / 2
>     pp2 = pPrev - h2 * nabalQQ qPrev
>     qNew = qPrev + hh * nablaPP pp2
>     pNew = pp2 - h2 * nabalQQ qNew
> h :: Double
> h = 0.01
> hs :: (Double, Double) -> [(Double, Double)]
> hs = P.iterate (oneStep nablaQ nablaP h)

We can check that the energy is conserved directly

ghci> P.map (P.uncurry bigH1) $P.take 5$               hs (pi/4, 0.0)
[-0.7071067811865476,-0.7071067816284917,-0.7071067829543561,-0.7071067851642338,-0.7071067882582796]

ghci> P.map (P.uncurry bigH1) $P.take 5$ P.drop 1000 $hs (pi/4, 0.0) [-0.7071069557227465,-0.7071069737863691,-0.7071069927439544,-0.7071070125961187,-0.7071070333434369] ## Two body problem Newton’s equations of motions for the two body problem are $\displaystyle \ddot{q}_1 = -\frac{x}{(x^2 + y^2)^{3/2}}, \quad \ddot{q}_2 = -\frac{y}{(x^2 + y^2)^{3/2}}$ And we can re-write those to use Hamilton’s equations with this Hamiltonian $\displaystyle H(p_1,p_2,q_1,q_2) = \frac{1}{2}(p_1^2 +p_2^2) - \frac{1}{\sqrt{q_1^2 + q_2^2}}$ The advantage of using this small example is that we know the exact solution should we need it. > ham2 :: P.Floating a => [a] -> a > ham2 [qq1, qq2, pp1, pp2] = 0.5 * (pp1^2 + pp2^2) - recip (sqrt (qq1^2 + qq2^2)) > ham2 _ = error "Hamiltonian defined for 4 generalized co-ordinates only" Again we can calculate the derivatives > nabla2 :: [[Sym Double]] > nabla2 = jacobian ((\x -> [x]) . ham2) [q1, q2, p1, p2] ghci> (P.mapM_ . P.mapM_) putStrLn . P.map (P.map show)$ nabla2
q1/(2.0*sqrt (q1*q1+q2*q2))/sqrt (q1*q1+q2*q2)/sqrt (q1*q1+q2*q2)+q1/(2.0*sqrt (q1*q1+q2*q2))/sqrt (q1*q1+q2*q2)/sqrt (q1*q1+q2*q2)
q2/(2.0*sqrt (q1*q1+q2*q2))/sqrt (q1*q1+q2*q2)/sqrt (q1*q1+q2*q2)+q2/(2.0*sqrt (q1*q1+q2*q2))/sqrt (q1*q1+q2*q2)/sqrt (q1*q1+q2*q2)
0.5*p1+0.5*p1
0.5*p2+0.5*p2

which after some simplification becomes

$\displaystyle \begin{matrix} \frac{q_1}{(q_1^2 + q_2^2)^{3/2}} \\ \frac{q_2}{(q_1^2 + q_2^2)^{3/2}} \\ p_1 \\ p_2 \end{matrix}$

Here’s one step of Störmer-Verlet using Haskell 98.

> oneStepH98 :: Double -> V2 (V2 Double) -> V2 (V2 Double)
> oneStepH98 hh prev = V2 qNew pNew
>   where
>     h2 = hh / 2
>     hhs = V2 hh hh
>     hh2s = V2 h2 h2
>     pp2 = psPrev - hh2s * nablaQ' qsPrev
>     qNew = qsPrev + hhs * nablaP' pp2
>     pNew = pp2 - hh2s * nablaQ' qNew
>     qsPrev = prev ^. L._x
>     psPrev = prev ^. L._y
>     nablaQ' qs = V2 (qq1 / r) (qq2 / r)
>       where
>         qq1 = qs ^. L._x
>         qq2 = qs ^. L._y
>         r   = (qq1 ^ 2 + qq2 ^ 2) ** (3/2)
>     nablaP' ps = ps

And here is the same thing using accelerate.

> oneStep2 :: Double -> Exp (V2 (V2 Double)) -> Exp (V2 (V2 Double))
> oneStep2 hh prev = lift $V2 qNew pNew > where > h2 = hh / 2 > hhs = lift$ V2 hh hh
>     hh2s = lift $V2 h2 h2 > pp2 = psPrev - hh2s * nablaQ' qsPrev > qNew = qsPrev + hhs * nablaP' pp2 > pNew = pp2 - hh2s * nablaQ' qNew > qsPrev :: Exp (V2 Double) > qsPrev = prev ^. _x > psPrev = prev ^. _y > nablaQ' :: Exp (V2 Double) -> Exp (V2 Double) > nablaQ' qs = lift (V2 (qq1 / r) (qq2 / r)) > where > qq1 = qs ^. _x > qq2 = qs ^. _y > r = (qq1 ^ 2 + qq2 ^ 2) ** (3/2) > nablaP' :: Exp (V2 Double) -> Exp (V2 Double) > nablaP' ps = ps With initial values below the solution is an ellipse with eccentricity $e$. > e, q10, q20, p10, p20 :: Double > e = 0.6 > q10 = 1 - e > q20 = 0.0 > p10 = 0.0 > p20 = sqrt ((1 + e) / (1 - e)) > initsH98 :: V2 (V2 Double) > initsH98 = V2 (V2 q10 q20) (V2 p10 p20) > > inits :: Exp (V2 (V2 Double)) > inits = lift initsH98 We can either keep all the steps of the simulation using accelerate and the CPU > nSteps :: Int > nSteps = 10000 > > dummyInputs :: Acc (Array DIM1 (V2 (V2 Double))) > dummyInputs = A.use$ A.fromList (Z :. nSteps) $> P.replicate nSteps (V2 (V2 0.0 0.0) (V2 0.0 0.0)) > runSteps :: Array DIM1 (V2 (V2 Double)) > runSteps = CPU.run$ A.scanl (\s _x -> (oneStep2 h s)) inits dummyInputs

Or we can do the same in plain Haskell

> dummyInputsH98 :: [V2 (V2 Double)]
> dummyInputsH98 = P.replicate nSteps (V2 (V2 0.0 0.0) (V2 0.0 0.0))
> runStepsH98 :: [V2 (V2 Double)]
> runStepsH98= P.scanl (\s _x -> (oneStepH98 h s)) initsH98 dummyInputsH98

The fact that the phase diagram for the two objects is periodic is encouraging.

And again we can check directly that the energy is conserved.

> bigH2BodyH98 :: V2 (V2 Double) -> Double
> bigH2BodyH98 z = ke + pe
>   where
>     q = z ^. L._x
>     p = z ^. L._y
>     pe = let V2 q1' q2' = q in negate $recip (sqrt (q1'^2 + q2'^2)) > ke = let V2 p1' p2' = p in 0.5 * (p1'^2 + p2'^2) ghci> P.maximum$ P.map bigH2BodyH98 $P.take 100$ P.drop 1000 runStepsH98
-0.49964056333352014

ghci> P.minimum $P.map bigH2BodyH98$ P.take 100 $P.drop 1000 runStepsH98 -0.49964155784917147 We’d like to measure performance and running the above for many steps might use up all available memory. Let’s confine ourselves to looking at the final result. > runSteps' :: Int -> Exp (V2 (V2 Double)) -> Exp (V2 (V2 Double)) > runSteps' nSteps = A.iterate (lift nSteps) (oneStep2 h) > reallyRunSteps' :: Int -> Array DIM1 (V2 (V2 Double)) > reallyRunSteps' nSteps = CPU.run$
>                          A.scanl (\s _x -> runSteps' nSteps s) inits
>                          (A.use $A.fromList (Z :. 1) [V2 (V2 0.0 0.0) (V2 0.0 0.0)]) # Performance ## Accelerate’s LLVM Let’s see what accelerate generates with {-# OPTIONS_GHC -Wall #-} {-# OPTIONS_GHC -fno-warn-name-shadowing #-} {-# OPTIONS_GHC -fno-warn-type-defaults #-} {-# OPTIONS_GHC -fno-warn-unused-do-bind #-} {-# OPTIONS_GHC -fno-warn-missing-methods #-} {-# OPTIONS_GHC -fno-warn-orphans #-} import Symplectic main :: IO () main = do putStrLn$ show reallyRunSteps' 100000000 It’s a bit verbose but we can look at the key “loop”: while5. [ 0.023] llvm: generated 102 instructions in 10 blocks [ 0.023] llvm: generated 127 instructions in 15 blocks [ 0.023] llvm: generated 86 instructions in 7 blocks [ 0.024] llvm: generated 84 instructions in 7 blocks [ 0.024] llvm: generated 18 instructions in 4 blocks [ 0.071] llvm: optimisation did work? True [ 0.072] ; ModuleID = 'scanS' target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-darwin15.5.0" @.memset_pattern = private unnamed_addr constant [2 x double] [double 2.000000e+00, double 2.000000e+00], align 16 @.memset_pattern.1 = private unnamed_addr constant [2 x double] [double 4.000000e-01, double 4.000000e-01], align 16 ; Function Attrs: nounwind define void @scanS(i64 %ix.start, i64 %ix.end, double* noalias nocapture %out.ad3, double* noalias nocapture %out.ad2, double* noalias nocapture %out.ad1, double* noalias nocapture %out.ad0, i64 %out.sh0, double* noalias nocapture readnone %fv0.ad3, double* noalias nocapture readnone %fv0.ad2, double* noalias nocapture readnone %fv0.ad1, double* noalias nocapture readnone %fv0.ad0, i64 %fv0.sh0) local_unnamed_addr #0 { entry: %0 = add i64 %fv0.sh0, 1 %1 = icmp slt i64 %ix.start, %ix.end br i1 %1, label %while1.top.preheader, label %while1.exit while1.top.preheader: ; preds = %entry %2 = add i64 %ix.start, 1 %3 = mul i64 %2, %fv0.sh0 br label %while1.top while1.top: ; preds = %while3.exit, %while1.top.preheader %indvars.iv = phi i64 [ %3, %while1.top.preheader ], [ %indvars.iv.next, %while3.exit ] %4 = phi i64 [ %ix.start, %while1.top.preheader ], [ %52, %while3.exit ] %5 = mul i64 %4, %fv0.sh0 %6 = mul i64 %4, %0 %7 = getelementptr double, double* %out.ad0, i64 %6 store double 2.000000e+00, double* %7, align 8 %8 = getelementptr double, double* %out.ad1, i64 %6 store double 0.000000e+00, double* %8, align 8 %9 = getelementptr double, double* %out.ad2, i64 %6 store double 0.000000e+00, double* %9, align 8 %10 = getelementptr double, double* %out.ad3, i64 %6 store double 4.000000e-01, double* %10, align 8 %11 = add i64 %5, %fv0.sh0 %12 = icmp slt i64 %5, %11 br i1 %12, label %while3.top.preheader, label %while3.exit while3.top.preheader: ; preds = %while1.top br label %while3.top while3.top: ; preds = %while3.top.preheader, %while5.exit %.in = phi i64 [ %44, %while5.exit ], [ %6, %while3.top.preheader ] %13 = phi i64 [ %51, %while5.exit ], [ %5, %while3.top.preheader ] %14 = phi <2 x double> [ %43, %while5.exit ], [ <double 2.000000e+00, double 0.000000e+00>, %while3.top.preheader ] %15 = phi <2 x double> [ %32, %while5.exit ], [ <double 0.000000e+00, double 4.000000e-01>, %while3.top.preheader ] br label %while5.top while5.top: ; preds = %while5.top, %while3.top %16 = phi i64 [ 0, %while3.top ], [ %19, %while5.top ] %17 = phi <2 x double> [ %14, %while3.top ], [ %43, %while5.top ] %18 = phi <2 x double> [ %15, %while3.top ], [ %32, %while5.top ] %19 = add nuw nsw i64 %16, 1 %20 = extractelement <2 x double> %18, i32 1 %21 = fmul fast double %20, %20 %22 = extractelement <2 x double> %18, i32 0 %23 = fmul fast double %22, %22 %24 = fadd fast double %21, %23 %25 = tail call double @llvm.pow.f64(double %24, double 1.500000e+00) #2 %26 = insertelement <2 x double> undef, double %25, i32 0 %27 = shufflevector <2 x double> %26, <2 x double> undef, <2 x i32> zeroinitializer %28 = fdiv fast <2 x double> %18, %27 %29 = fmul fast <2 x double> %28, <double 5.000000e-03, double 5.000000e-03> %30 = fsub fast <2 x double> %17, %29 %31 = fmul fast <2 x double> %30, <double 1.000000e-02, double 1.000000e-02> %32 = fadd fast <2 x double> %31, %18 %33 = extractelement <2 x double> %32, i32 1 %34 = fmul fast double %33, %33 %35 = extractelement <2 x double> %32, i32 0 %36 = fmul fast double %35, %35 %37 = fadd fast double %34, %36 %38 = tail call double @llvm.pow.f64(double %37, double 1.500000e+00) #2 %39 = insertelement <2 x double> undef, double %38, i32 0 %40 = shufflevector <2 x double> %39, <2 x double> undef, <2 x i32> zeroinitializer %41 = fdiv fast <2 x double> %32, %40 %42 = fmul fast <2 x double> %41, <double 5.000000e-03, double 5.000000e-03> %43 = fsub fast <2 x double> %30, %42 %exitcond = icmp eq i64 %19, 100000000 br i1 %exitcond, label %while5.exit, label %while5.top while5.exit: ; preds = %while5.top %44 = add i64 %.in, 1 %45 = getelementptr double, double* %out.ad0, i64 %44 %46 = extractelement <2 x double> %43, i32 0 store double %46, double* %45, align 8 %47 = getelementptr double, double* %out.ad1, i64 %44 %48 = extractelement <2 x double> %43, i32 1 store double %48, double* %47, align 8 %49 = getelementptr double, double* %out.ad2, i64 %44 store double %35, double* %49, align 8 %50 = getelementptr double, double* %out.ad3, i64 %44 store double %33, double* %50, align 8 %51 = add i64 %13, 1 %exitcond7 = icmp eq i64 %51, %indvars.iv br i1 %exitcond7, label %while3.exit.loopexit, label %while3.top while3.exit.loopexit: ; preds = %while5.exit br label %while3.exit while3.exit: ; preds = %while3.exit.loopexit, %while1.top %52 = add nsw i64 %4, 1 %indvars.iv.next = add i64 %indvars.iv, %fv0.sh0 %exitcond8 = icmp eq i64 %52, %ix.end br i1 %exitcond8, label %while1.exit.loopexit, label %while1.top while1.exit.loopexit: ; preds = %while3.exit br label %while1.exit while1.exit: ; preds = %while1.exit.loopexit, %entry ret void } ; Function Attrs: nounwind define void @scanP1(i64 %ix.start, i64 %ix.end, i64 %ix.stride, i64 %ix.steps, double* noalias nocapture %out.ad3, double* noalias nocapture %out.ad2, double* noalias nocapture %out.ad1, double* noalias nocapture %out.ad0, i64 %out.sh0, double* noalias nocapture %tmp.ad3, double* noalias nocapture %tmp.ad2, double* noalias nocapture %tmp.ad1, double* noalias nocapture %tmp.ad0, i64 %tmp.sh0, double* noalias nocapture readonly %fv0.ad3, double* noalias nocapture readonly %fv0.ad2, double* noalias nocapture readonly %fv0.ad1, double* noalias nocapture readonly %fv0.ad0, i64 %fv0.sh0) local_unnamed_addr #0 { entry: %0 = mul i64 %ix.stride, %ix.start %1 = add i64 %0, %ix.stride %2 = icmp sle i64 %1, %fv0.sh0 %3 = select i1 %2, i64 %1, i64 %fv0.sh0 %4 = icmp eq i64 %ix.start, 0 %5 = add i64 %0, 1 %6 = select i1 %4, i64 %0, i64 %5 br i1 %4, label %if5.exit, label %if5.else if5.else: ; preds = %entry %7 = getelementptr double, double* %fv0.ad3, i64 %0 %8 = load double, double* %7, align 8 %9 = getelementptr double, double* %fv0.ad2, i64 %0 %10 = load double, double* %9, align 8 %11 = getelementptr double, double* %fv0.ad1, i64 %0 %12 = load double, double* %11, align 8 %13 = getelementptr double, double* %fv0.ad0, i64 %0 %14 = load double, double* %13, align 8 %15 = insertelement <2 x double> undef, double %14, i32 0 %16 = insertelement <2 x double> %15, double %12, i32 1 %17 = insertelement <2 x double> undef, double %10, i32 0 %18 = insertelement <2 x double> %17, double %8, i32 1 br label %if5.exit if5.exit: ; preds = %entry, %if5.else %19 = phi i64 [ %5, %if5.else ], [ %0, %entry ] %20 = phi <2 x double> [ %16, %if5.else ], [ <double 2.000000e+00, double 0.000000e+00>, %entry ] %21 = phi <2 x double> [ %18, %if5.else ], [ <double 0.000000e+00, double 4.000000e-01>, %entry ] %22 = getelementptr double, double* %out.ad0, i64 %6 %23 = extractelement <2 x double> %20, i32 0 store double %23, double* %22, align 8 %24 = getelementptr double, double* %out.ad1, i64 %6 %25 = extractelement <2 x double> %20, i32 1 store double %25, double* %24, align 8 %26 = getelementptr double, double* %out.ad2, i64 %6 %27 = extractelement <2 x double> %21, i32 0 store double %27, double* %26, align 8 %28 = getelementptr double, double* %out.ad3, i64 %6 %29 = extractelement <2 x double> %21, i32 1 store double %29, double* %28, align 8 %30 = icmp slt i64 %19, %3 br i1 %30, label %while9.top.preheader, label %while9.exit while9.top.preheader: ; preds = %if5.exit br label %while9.top while9.top: ; preds = %while9.top.preheader, %while11.exit %.in = phi i64 [ %62, %while11.exit ], [ %6, %while9.top.preheader ] %31 = phi i64 [ %69, %while11.exit ], [ %19, %while9.top.preheader ] %32 = phi <2 x double> [ %61, %while11.exit ], [ %20, %while9.top.preheader ] %33 = phi <2 x double> [ %50, %while11.exit ], [ %21, %while9.top.preheader ] br label %while11.top while11.top: ; preds = %while11.top, %while9.top %34 = phi i64 [ 0, %while9.top ], [ %37, %while11.top ] %35 = phi <2 x double> [ %32, %while9.top ], [ %61, %while11.top ] %36 = phi <2 x double> [ %33, %while9.top ], [ %50, %while11.top ] %37 = add nuw nsw i64 %34, 1 %38 = extractelement <2 x double> %36, i32 1 %39 = fmul fast double %38, %38 %40 = extractelement <2 x double> %36, i32 0 %41 = fmul fast double %40, %40 %42 = fadd fast double %39, %41 %43 = tail call double @llvm.pow.f64(double %42, double 1.500000e+00) #2 %44 = insertelement <2 x double> undef, double %43, i32 0 %45 = shufflevector <2 x double> %44, <2 x double> undef, <2 x i32> zeroinitializer %46 = fdiv fast <2 x double> %36, %45 %47 = fmul fast <2 x double> %46, <double 5.000000e-03, double 5.000000e-03> %48 = fsub fast <2 x double> %35, %47 %49 = fmul fast <2 x double> %48, <double 1.000000e-02, double 1.000000e-02> %50 = fadd fast <2 x double> %49, %36 %51 = extractelement <2 x double> %50, i32 1 %52 = fmul fast double %51, %51 %53 = extractelement <2 x double> %50, i32 0 %54 = fmul fast double %53, %53 %55 = fadd fast double %52, %54 %56 = tail call double @llvm.pow.f64(double %55, double 1.500000e+00) #2 %57 = insertelement <2 x double> undef, double %56, i32 0 %58 = shufflevector <2 x double> %57, <2 x double> undef, <2 x i32> zeroinitializer %59 = fdiv fast <2 x double> %50, %58 %60 = fmul fast <2 x double> %59, <double 5.000000e-03, double 5.000000e-03> %61 = fsub fast <2 x double> %48, %60 %exitcond = icmp eq i64 %37, 100000000 br i1 %exitcond, label %while11.exit, label %while11.top while11.exit: ; preds = %while11.top %62 = add i64 %.in, 1 %63 = getelementptr double, double* %out.ad0, i64 %62 %64 = extractelement <2 x double> %61, i32 0 store double %64, double* %63, align 8 %65 = getelementptr double, double* %out.ad1, i64 %62 %66 = extractelement <2 x double> %61, i32 1 store double %66, double* %65, align 8 %67 = getelementptr double, double* %out.ad2, i64 %62 store double %53, double* %67, align 8 %68 = getelementptr double, double* %out.ad3, i64 %62 store double %51, double* %68, align 8 %69 = add nsw i64 %31, 1 %70 = icmp slt i64 %69, %3 br i1 %70, label %while9.top, label %while9.exit.loopexit while9.exit.loopexit: ; preds = %while11.exit br label %while9.exit while9.exit: ; preds = %while9.exit.loopexit, %if5.exit %71 = phi <2 x double> [ %20, %if5.exit ], [ %61, %while9.exit.loopexit ] %72 = phi <2 x double> [ %21, %if5.exit ], [ %50, %while9.exit.loopexit ] %73 = getelementptr double, double* %tmp.ad0, i64 %ix.start %74 = extractelement <2 x double> %71, i32 0 store double %74, double* %73, align 8 %75 = getelementptr double, double* %tmp.ad1, i64 %ix.start %76 = extractelement <2 x double> %71, i32 1 store double %76, double* %75, align 8 %77 = getelementptr double, double* %tmp.ad2, i64 %ix.start %78 = extractelement <2 x double> %72, i32 0 store double %78, double* %77, align 8 %79 = getelementptr double, double* %tmp.ad3, i64 %ix.start %80 = extractelement <2 x double> %72, i32 1 store double %80, double* %79, align 8 ret void } ; Function Attrs: nounwind define void @scanP2(i64 %ix.start, i64 %ix.end, double* noalias nocapture %tmp.ad3, double* noalias nocapture %tmp.ad2, double* noalias nocapture %tmp.ad1, double* noalias nocapture %tmp.ad0, i64 %tmp.sh0, double* noalias nocapture readnone %fv0.ad3, double* noalias nocapture readnone %fv0.ad2, double* noalias nocapture readnone %fv0.ad1, double* noalias nocapture readnone %fv0.ad0, i64 %fv0.sh0) local_unnamed_addr #0 { entry: %0 = add i64 %ix.start, 1 %1 = icmp slt i64 %0, %ix.end br i1 %1, label %while1.top.preheader, label %while1.exit while1.top.preheader: ; preds = %entry %2 = getelementptr double, double* %tmp.ad3, i64 %ix.start %3 = load double, double* %2, align 8 %4 = getelementptr double, double* %tmp.ad2, i64 %ix.start %5 = load double, double* %4, align 8 %6 = getelementptr double, double* %tmp.ad1, i64 %ix.start %7 = load double, double* %6, align 8 %8 = getelementptr double, double* %tmp.ad0, i64 %ix.start %9 = load double, double* %8, align 8 %10 = insertelement <2 x double> undef, double %5, i32 0 %11 = insertelement <2 x double> %10, double %3, i32 1 %12 = insertelement <2 x double> undef, double %9, i32 0 %13 = insertelement <2 x double> %12, double %7, i32 1 br label %while1.top while1.top: ; preds = %while3.exit, %while1.top.preheader %14 = phi i64 [ %45, %while3.exit ], [ %0, %while1.top.preheader ] %15 = phi <2 x double> [ %44, %while3.exit ], [ %13, %while1.top.preheader ] %16 = phi <2 x double> [ %33, %while3.exit ], [ %11, %while1.top.preheader ] br label %while3.top while3.top: ; preds = %while3.top, %while1.top %17 = phi i64 [ 0, %while1.top ], [ %20, %while3.top ] %18 = phi <2 x double> [ %15, %while1.top ], [ %44, %while3.top ] %19 = phi <2 x double> [ %16, %while1.top ], [ %33, %while3.top ] %20 = add nuw nsw i64 %17, 1 %21 = extractelement <2 x double> %19, i32 1 %22 = fmul fast double %21, %21 %23 = extractelement <2 x double> %19, i32 0 %24 = fmul fast double %23, %23 %25 = fadd fast double %22, %24 %26 = tail call double @llvm.pow.f64(double %25, double 1.500000e+00) #2 %27 = insertelement <2 x double> undef, double %26, i32 0 %28 = shufflevector <2 x double> %27, <2 x double> undef, <2 x i32> zeroinitializer %29 = fdiv fast <2 x double> %19, %28 %30 = fmul fast <2 x double> %29, <double 5.000000e-03, double 5.000000e-03> %31 = fsub fast <2 x double> %18, %30 %32 = fmul fast <2 x double> %31, <double 1.000000e-02, double 1.000000e-02> %33 = fadd fast <2 x double> %32, %19 %34 = extractelement <2 x double> %33, i32 1 %35 = fmul fast double %34, %34 %36 = extractelement <2 x double> %33, i32 0 %37 = fmul fast double %36, %36 %38 = fadd fast double %35, %37 %39 = tail call double @llvm.pow.f64(double %38, double 1.500000e+00) #2 %40 = insertelement <2 x double> undef, double %39, i32 0 %41 = shufflevector <2 x double> %40, <2 x double> undef, <2 x i32> zeroinitializer %42 = fdiv fast <2 x double> %33, %41 %43 = fmul fast <2 x double> %42, <double 5.000000e-03, double 5.000000e-03> %44 = fsub fast <2 x double> %31, %43 %exitcond = icmp eq i64 %20, 100000000 br i1 %exitcond, label %while3.exit, label %while3.top while3.exit: ; preds = %while3.top %45 = add i64 %14, 1 %46 = getelementptr double, double* %tmp.ad0, i64 %14 %47 = extractelement <2 x double> %44, i32 0 store double %47, double* %46, align 8 %48 = getelementptr double, double* %tmp.ad1, i64 %14 %49 = extractelement <2 x double> %44, i32 1 store double %49, double* %48, align 8 %50 = getelementptr double, double* %tmp.ad2, i64 %14 store double %36, double* %50, align 8 %51 = getelementptr double, double* %tmp.ad3, i64 %14 store double %34, double* %51, align 8 %exitcond7 = icmp eq i64 %45, %ix.end br i1 %exitcond7, label %while1.exit.loopexit, label %while1.top while1.exit.loopexit: ; preds = %while3.exit br label %while1.exit while1.exit: ; preds = %while1.exit.loopexit, %entry ret void } ; Function Attrs: nounwind define void @scanP3(i64 %ix.start, i64 %ix.end, i64 %ix.stride, double* noalias nocapture %out.ad3, double* noalias nocapture %out.ad2, double* noalias nocapture %out.ad1, double* noalias nocapture %out.ad0, i64 %out.sh0, double* noalias nocapture readonly %tmp.ad3, double* noalias nocapture readonly %tmp.ad2, double* noalias nocapture readonly %tmp.ad1, double* noalias nocapture readonly %tmp.ad0, i64 %tmp.sh0, double* noalias nocapture readnone %fv0.ad3, double* noalias nocapture readnone %fv0.ad2, double* noalias nocapture readnone %fv0.ad1, double* noalias nocapture readnone %fv0.ad0, i64 %fv0.sh0) local_unnamed_addr #0 { entry: %0 = add i64 %ix.start, 1 %1 = mul i64 %0, %ix.stride %2 = add i64 %1, %ix.stride %3 = icmp sle i64 %2, %out.sh0 %4 = select i1 %3, i64 %2, i64 %out.sh0 %5 = add i64 %1, 1 %6 = add i64 %4, 1 %7 = icmp slt i64 %5, %6 br i1 %7, label %while1.top.preheader, label %while1.exit while1.top.preheader: ; preds = %entry %8 = getelementptr double, double* %tmp.ad0, i64 %ix.start %9 = load double, double* %8, align 8 %10 = getelementptr double, double* %tmp.ad1, i64 %ix.start %11 = load double, double* %10, align 8 %12 = getelementptr double, double* %tmp.ad2, i64 %ix.start %13 = load double, double* %12, align 8 %14 = getelementptr double, double* %tmp.ad3, i64 %ix.start %15 = load double, double* %14, align 8 %16 = insertelement <2 x double> undef, double %9, i32 0 %17 = insertelement <2 x double> %16, double %11, i32 1 %18 = insertelement <2 x double> undef, double %13, i32 0 %19 = insertelement <2 x double> %18, double %15, i32 1 br label %while1.top while1.top: ; preds = %while1.top.preheader, %while3.exit %20 = phi i64 [ %55, %while3.exit ], [ %5, %while1.top.preheader ] br label %while3.top while3.top: ; preds = %while3.top, %while1.top %21 = phi i64 [ 0, %while1.top ], [ %24, %while3.top ] %22 = phi <2 x double> [ %17, %while1.top ], [ %48, %while3.top ] %23 = phi <2 x double> [ %19, %while1.top ], [ %37, %while3.top ] %24 = add nuw nsw i64 %21, 1 %25 = extractelement <2 x double> %23, i32 1 %26 = fmul fast double %25, %25 %27 = extractelement <2 x double> %23, i32 0 %28 = fmul fast double %27, %27 %29 = fadd fast double %26, %28 %30 = tail call double @llvm.pow.f64(double %29, double 1.500000e+00) #2 %31 = insertelement <2 x double> undef, double %30, i32 0 %32 = shufflevector <2 x double> %31, <2 x double> undef, <2 x i32> zeroinitializer %33 = fdiv fast <2 x double> %23, %32 %34 = fmul fast <2 x double> %33, <double 5.000000e-03, double 5.000000e-03> %35 = fsub fast <2 x double> %22, %34 %36 = fmul fast <2 x double> %35, <double 1.000000e-02, double 1.000000e-02> %37 = fadd fast <2 x double> %36, %23 %38 = extractelement <2 x double> %37, i32 1 %39 = fmul fast double %38, %38 %40 = extractelement <2 x double> %37, i32 0 %41 = fmul fast double %40, %40 %42 = fadd fast double %39, %41 %43 = tail call double @llvm.pow.f64(double %42, double 1.500000e+00) #2 %44 = insertelement <2 x double> undef, double %43, i32 0 %45 = shufflevector <2 x double> %44, <2 x double> undef, <2 x i32> zeroinitializer %46 = fdiv fast <2 x double> %37, %45 %47 = fmul fast <2 x double> %46, <double 5.000000e-03, double 5.000000e-03> %48 = fsub fast <2 x double> %35, %47 %exitcond = icmp eq i64 %24, 100000000 br i1 %exitcond, label %while3.exit, label %while3.top while3.exit: ; preds = %while3.top %49 = getelementptr double, double* %out.ad0, i64 %20 %50 = extractelement <2 x double> %48, i32 0 store double %50, double* %49, align 8 %51 = getelementptr double, double* %out.ad1, i64 %20 %52 = extractelement <2 x double> %48, i32 1 store double %52, double* %51, align 8 %53 = getelementptr double, double* %out.ad2, i64 %20 store double %40, double* %53, align 8 %54 = getelementptr double, double* %out.ad3, i64 %20 store double %38, double* %54, align 8 %55 = add i64 %20, 1 %56 = icmp slt i64 %55, %6 br i1 %56, label %while1.top, label %while1.exit.loopexit while1.exit.loopexit: ; preds = %while3.exit br label %while1.exit while1.exit: ; preds = %while1.exit.loopexit, %entry ret void } ; Function Attrs: norecurse nounwind define void @generate(i64 %ix.start, i64 %ix.end, double* noalias nocapture %out.ad3, double* noalias nocapture %out.ad2, double* noalias nocapture %out.ad1, double* noalias nocapture %out.ad0, i64 %out.sh0, double* noalias nocapture readnone %fv0.ad3, double* noalias nocapture readnone %fv0.ad2, double* noalias nocapture readnone %fv0.ad1, double* noalias nocapture readnone %fv0.ad0, i64 %fv0.sh0) local_unnamed_addr #1 { entry: %0 = icmp sgt i64 %ix.end, %ix.start br i1 %0, label %while1.top.preheader, label %while1.exit while1.top.preheader: ; preds = %entry %scevgep = getelementptr double, double* %out.ad1, i64 %ix.start %scevgep1 = bitcast double* %scevgep to i8* %1 = sub i64 %ix.end, %ix.start %2 = shl i64 %1, 3 call void @llvm.memset.p0i8.i64(i8* %scevgep1, i8 0, i64 %2, i32 8, i1 false) %scevgep2 = getelementptr double, double* %out.ad2, i64 %ix.start %scevgep23 = bitcast double* %scevgep2 to i8* call void @llvm.memset.p0i8.i64(i8* %scevgep23, i8 0, i64 %2, i32 8, i1 false) %scevgep4 = getelementptr double, double* %out.ad0, i64 %ix.start %scevgep45 = bitcast double* %scevgep4 to i8* call void @memset_pattern16(i8* %scevgep45, i8* bitcast ([2 x double]* @.memset_pattern to i8*), i64 %2) #0 %scevgep6 = getelementptr double, double* %out.ad3, i64 %ix.start %scevgep67 = bitcast double* %scevgep6 to i8* call void @memset_pattern16(i8* %scevgep67, i8* bitcast ([2 x double]* @.memset_pattern.1 to i8*), i64 %2) #0 br label %while1.exit while1.exit: ; preds = %while1.top.preheader, %entry ret void } ; Function Attrs: nounwind readonly declare double @llvm.pow.f64(double, double) #2 ; Function Attrs: argmemonly nounwind declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i32, i1) #3 ; Function Attrs: argmemonly declare void @memset_pattern16(i8* nocapture, i8* nocapture readonly, i64) #4 attributes #0 = { nounwind } attributes #1 = { norecurse nounwind } attributes #2 = { nounwind readonly } attributes #3 = { argmemonly nounwind } attributes #4 = { argmemonly } And then we can run it for $10^8$ steps and see how long it takes. 9,031,008 bytes allocated in the heap 1,101,152 bytes copied during GC 156,640 bytes maximum residency (2 sample(s)) 28,088 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 16 colls, 0 par 0.002s 0.006s 0.0004s 0.0041s Gen 1 2 colls, 0 par 0.002s 0.099s 0.0496s 0.0990s INIT time 0.000s ( 0.002s elapsed) MUT time 10.195s ( 10.501s elapsed) GC time 0.004s ( 0.105s elapsed) EXIT time 0.000s ( 0.000s elapsed) Total time 10.206s ( 10.608s elapsed) %GC time 0.0% (1.0% elapsed) Alloc rate 885,824 bytes per MUT second Productivity 100.0% of total user, 99.0% of total elapsed ## Julia’s LLVM Let’s try the same problem in Julia. using StaticArrays e = 0.6 q10 = 1 - e q20 = 0.0 p10 = 0.0 p20 = sqrt((1 + e) / (1 - e)) h = 0.01 x1 = SVector{2,Float64}(q10, q20) x2 = SVector{2,Float64}(p10, p20) x3 = SVector{2,SVector{2,Float64}}(x1,x2) @inline function oneStep(h, prev) h2 = h / 2 @inbounds qsPrev = prev[1] @inbounds psPrev = prev[2] @inline function nablaQQ(qs) @inbounds q1 = qs[1] @inbounds q2 = qs[2] r = (q1^2 + q2^2) ^ (3/2) return SVector{2,Float64}(q1 / r, q2 / r) end @inline function nablaPP(ps) return ps end p2 = psPrev - h2 * nablaQQ(qsPrev) qNew = qsPrev + h * nablaPP(p2) pNew = p2 - h2 * nablaQQ(qNew) return SVector{2,SVector{2,Float64}}(qNew, pNew) end function manyStepsFinal(n,h,prev) for i in 1:n prev = oneStep(h,prev) end return prev end final = manyStepsFinal(100000000,h,x3) print(final) Again we can see how long it takes 22.20 real 22.18 user 0.32 sys Surprisingly it takes longer but I am Julia novice so it could be some rookie error. Two things occur to me: 1. Let’s look at the llvm and see if we can we can find an explanation. 2. Let’s analyse execution time versus number of steps to see what the code generation cost is and the execution cost. It may be that Julia takes longer to generate code but has better execution times. define void @julia_oneStep_71735(%SVector.65* noalias sret, double, %SVector.65*) #0 { top: %3 = fmul double %1, 5.000000e-01 %4 = getelementptr inbounds %SVector.65, %SVector.65* %2, i64 0, i32 0, i64 0, i32 0, i64 0 %5 = load double, double* %4, align 1 %6 = getelementptr inbounds %SVector.65, %SVector.65* %2, i64 0, i32 0, i64 0, i32 0, i64 1 %7 = load double, double* %6, align 1 %8 = getelementptr inbounds %SVector.65, %SVector.65* %2, i64 0, i32 0, i64 1, i32 0, i64 0 %9 = load double, double* %8, align 1 %10 = getelementptr inbounds %SVector.65, %SVector.65* %2, i64 0, i32 0, i64 1, i32 0, i64 1 %11 = load double, double* %10, align 1 %12 = fmul double %5, %5 %13 = fmul double %7, %7 %14 = fadd double %12, %13 %15 = call double @"julia_^_71741"(double %14, double 1.500000e+00) #0 %16 = fdiv double %5, %15 %17 = fdiv double %7, %15 %18 = fmul double %3, %16 %19 = fmul double %3, %17 %20 = fsub double %9, %18 %21 = fsub double %11, %19 %22 = fmul double %20, %1 %23 = fmul double %21, %1 %24 = fadd double %5, %22 %25 = fadd double %7, %23 %26 = fmul double %24, %24 %27 = fmul double %25, %25 %28 = fadd double %26, %27 %29 = call double @"julia_^_71741"(double %28, double 1.500000e+00) #0 %30 = fdiv double %24, %29 %31 = fdiv double %25, %29 %32 = fmul double %3, %30 %33 = fmul double %3, %31 %34 = fsub double %20, %32 %35 = fsub double %21, %33 %36 = getelementptr inbounds %SVector.65, %SVector.65* %0, i64 0, i32 0, i64 0, i32 0, i64 0 store double %24, double* %36, align 8 %37 = getelementptr inbounds %SVector.65, %SVector.65* %0, i64 0, i32 0, i64 0, i32 0, i64 1 store double %25, double* %37, align 8 %38 = getelementptr inbounds %SVector.65, %SVector.65* %0, i64 0, i32 0, i64 1, i32 0, i64 0 store double %34, double* %38, align 8 %39 = getelementptr inbounds %SVector.65, %SVector.65* %0, i64 0, i32 0, i64 1, i32 0, i64 1 store double %35, double* %39, align 8 ret void } nothing We can see two things: 1. Julia doesn’t use SIMD by default. We can change this by using -O3. In the event (I don’t reproduce it here), this makes very little difference to performance. 2. Julia generates %15 = call double @"julia_^_71741"(double %14, double 1.500000e+00) #0 whereas GHC generates %26 = tail call double @llvm.pow.f64(double %25, double 1.500000e+00) #2 Now it is entirely possible that this results in usage of different libMs, the Julia calling openlibm and GHC calling the system libm which on my machine is the one that comes with MAC OS X and is apparently quite a lot faster. We could try compiling the actual llvm and replacing the Julia calls with pow but maybe that is the subject for another blog post. Just in case, “on-the-fly” compilation is obscuring runtime performance let’s try running both the Haskell and Julia for 20m, 40m, 80m and 100m steps. Haskell linreg([20,40,80,100],[2.0,4.0,7.9,10.1]) (-0.03000000000000025,0.1005) Julia linreg([20,40,80,100],[5.7,9.8,18.1,22.2]) (1.5600000000000005,0.2065) Cleary the negative compilation time for Haskell is wrong but I think it’s fair to infer that Julia has a higher start up cost and Haskell is 2 times quicker but as noted above this may be because of different math libraries. # Colophon I used shake to build some of the material “on the fly”. There are still some manual steps to producing the blog post. The code can be downloaded from github. # Trouble with Tribbles # Introduction Tribbles originate from the planet Iota Geminorum IV and, according to Dr. McCoy, are born pregnant. No further details are given but we can follow Gurtin and MacCamy (1974) and perhaps recover some of what happens on the Enterprise. Of course, age-dependent population models are of more than fictional use and can be applied, for example, to modelling the progression of Malaria in infected hosts. We roughly follow some of J. J. Thibodeaux and Schlittenhardt (2011) who themselves reference Belair, Mackey, and Mahaffy (1995). Of interest to Haskellers are: • The use of the hmatrix package which now contains functions to solve tridiagonal systems used in this post. You will need to use HEAD until a new hackage / stackage release is made. My future plan is to use CUDA via accelerate and compare. • The use of dimensions in a medium-sized example. It would have been nice to have tried the units package but it seemed harder work to use and, as ever, “Time’s wingèd chariot” was the enemy. The source for this post can be downloaded from github. # Age-Dependent Populations ## McKendrick / von Foerster McKendrick and von Foerster independently derived a model of age-dependent population growth. Let $n(a,t)$ be the density of females of age $a$ at time $t$. The number of females between ages $a$ and $a + \delta a$ are thus $n(a, t)\delta a$. Assuming individuals are born at age $0$, we have $\displaystyle \frac{\partial}{\partial t}(n(a, t)\delta a) = J(a, t) - J(a + \delta a, t) - \mu(a, t)n(a, t)\delta a$ where $\mu(a, t)$ is the death rate density and $J(a, t)$ denotes the rate of entry to the cohort of age $a$. Dividing by $\delta a$ we obtain $\displaystyle \frac{\partial}{\partial t}n(a, t) = - \frac{J(a + \delta a, t) - J(a, t)}{\delta a} - \mu(a, t)n(a, t)$ which in the limit becomes $\displaystyle \frac{\partial}{\partial t}n(a, t) = - \frac{\partial J(a, t)}{\partial a} - \mu(a, t)n(a, t)$ We can further assume that the rate of entry to a cohort is proportional to the density of individuals times a velocity of aging $v(a, t)$. $\displaystyle J(a, t) = n(a, t)v(a, t)$ Occasionally there is some reason to assume that aging one year is different to experiencing one year but we further assume $v = 1$. We thus obtain $\displaystyle \frac{\partial n(a, t)}{\partial t} + \frac{\partial n(a, t)}{\partial a} = - \mu(a, t)n(a, t)$ ## Gurtin / MacCamy To solve any PDE we need boundary and initial conditions. The number of births at time $t$ is $\displaystyle n(0, t) = \int_0^\infty n(a, t) m(a, N(t))\, \mathrm{d}a$ where $m$ is the natality aka birth-modulus and $\displaystyle N(t) = \int_0^\infty n(a, t)\, \mathrm{d}a$ and we further assume that the initial condition $\displaystyle n(a, 0) = n_0(a)$ for some given $n_0$. Gurtin and MacCamy (1974) focus on the situation where $\displaystyle m(a, N(t)) = \beta(N)e^{-\alpha a}$ and we can also assume that the birth rate of Tribbles decreases exponentially with age and further that Tribbles can live forever. Gurtin and MacCamy (1974) then transform the PDE to obtain a pair of linked ODEs which can then be solved numerically. Of course, we know what happens in the Enterprise and rather than continue with this example, let us turn our attention to the more serious subject of Malaria. # Malaria I realise now that I went a bit overboard with references. Hopefully they don’t interrupt the flow too much. The World Health Organisation (WHO) estimated that in 2015 there were 214 million new cases of malaria resulting in 438,000 deaths (source: Wikipedia). The lifecycle of the plasmodium parasite that causes malaria is extremely ingenious. J. J. Thibodeaux and Schlittenhardt (2011) model the human segment of the plasmodium lifecycle and further propose a way of determing an optimal treatment for an infected individual. Hall et al. (2013) also model the effect of an anti-malarial. Let us content ourselves with reproducing part of the paper by J. J. Thibodeaux and Schlittenhardt (2011). At one part of its sojourn in humans, plasmodium infects erythrocytes aka red blood cells. These latter contain haemoglobin (aka hemoglobin). The process by which red blood cells are produced, Erythropoiesis, is modulated in a feedback loop by erythropoietin. The plasmodium parasite severely disrupts this process. Presumably the resulting loss of haemoglobin is one reason that an infected individual feels ill. As can be seen in the overview by Torbett and Friedman (2009), the full feedback loop is complex. So as not to lose ourselves in the details and following J. J. Thibodeaux and Schlittenhardt (2011) and Belair, Mackey, and Mahaffy (1995), we consider a model with two compartments. • Precursors: prototype erythrocytes developing in the bone marrow with $p(\mu, t)$ being the density of such cells of age $\mu$ at time $t$. • Erythrocytes: mature red blood cells circulating in the blood with $m(\mu, t)$ being the density of such cells of age $\nu$ at time $t$. \displaystyle \begin{aligned} \frac{\partial p(\mu, t)}{\partial t} + g(E(t))\frac{\partial p(\mu, t)}{\partial \mu} &= \sigma(\mu, t, E(t))p(\mu, t) & 0 < \mu < \mu_F & & 0 < t < T \\ \frac{\partial m(\nu, t)}{\partial t} + \phantom{g(E(t))}\frac{\partial m(\nu, t)}{\partial \nu} &= -\gamma(\nu, t, M(t))m(\nu, t) & 0 < \nu < \nu_F & & 0 < t < T \end{aligned} where $\sigma(\mu, t, E(t))$ is the birth rate of precursors and $\gamma(\nu, t, M(t))$ is the death rate of erythrocytes, $g(E(t))$ is the maturation rate of precursors and where $\displaystyle M(t) = \int_0^{\nu_F} p(\nu, t) \,\mathrm{d}\nu$ As boundary conditions, we have that the number of precursors maturing must equal the production of number of erythrocytes $\displaystyle m(0, t) = g(E(t))p(\mu_F, t)$ and the production of the of the number of precursors depends on the level of erythropoietin $\displaystyle g(E(t))p(0, t) = \phi(t)E(t)$ where $\phi(t)$ is some proportionality function. As initial conditions, we have \displaystyle \begin{aligned} p(\mu, 0) &= p_0(\mu) \\ m(\nu, 0) &= m_0(\nu) \end{aligned} We can further model the erythropoietin dynamics as $\displaystyle \frac{\mathrm{d}E(t)}{\mathrm{d}t} = f(M(t), t) - a_E(P(t))E(t)$ where $f$ is the feedback function from the kidneys and the decay rate, $a_E$ depends on the total precursor population $P(t)$ (Sawyer, Krantz, and Goldwasser (1987)) although this often is taken to be a constant and I would feel more comfortable with a more recent citation and where $\displaystyle P(t) = \int_0^{\mu_F} p(\mu, t) \,\mathrm{d}\mu$ As initial condition we have $\displaystyle E(0) = E_0$ ## A Finite Difference Attempt Let us try solving the above model using a finite difference scheme observing that we currently have no basis for whether it has a solution and whether the finite difference scheme approximates such a solution! We follow J. J. Thibodeaux and Schlittenhardt (2011) who give a proof of convergence presumably with some conditions; any failure of the scheme is entirely mine. Divide up the age and time ranges, $[0, \mu_F]$, $[0, \nu_F]$ and $[0, T]$ into equal sub-intervals, $[\mu_i, \mu_{i+1}]$, $[\nu_j, \nu_{j+1}]$ and $[t_k, t_{k+1}]$ where \displaystyle \begin{aligned} \mu_i &= i\Delta\mu & & \mathrm{for} & i = 1 \ldots n_1 \\ \nu_j &= j\Delta\nu & & \mathrm{for} & j = 1 \ldots n_2 \\ t_k &= k\Delta t & & \mathrm{for} & k = 1 \ldots K \end{aligned} where $\Delta\mu = \mu_F / n_1$, $\Delta\nu = \nu_F / n_2$ and $\Delta t = T / K$. Denoting $p(\mu_i, t_k) = p_i^k$ and similarly we obtain \displaystyle \begin{aligned} \frac{p_i^{k+1} - p_i^k}{\Delta t} + g^k\frac{p_i^{k+1} - p_{i-1}^{k+1}}{\Delta\mu} &= \sigma_i^k p_i^{k+1} \\ \frac{m_j^{k+1} - m_j^k}{\Delta t} + \phantom{g^k}\frac{m_j^{k+1} - m_{j-1}^{k+1}}{\Delta\mu} &= -\gamma_j^k m_j^{k+1} \end{aligned} and \displaystyle \begin{aligned} \frac{E^{k+1} - E^k}{\Delta t} &= f^k - a_E^k E^{k+1} \\ g^k p_0^{k+1} &= \phi^k E^k \\ m_0^{k+1} &= g^k m_{n_1}^{k+1} \end{aligned} Re-arranging we get \displaystyle \begin{aligned} -g^k\frac{\Delta t}{\Delta \mu}p_{i-1}^{k+1} + \bigg(1 + g^k\frac{\Delta t}{\Delta \mu} - \Delta t \sigma_i^k\bigg)p_i^{k+1} &= p_i^k \\ \frac{\Delta t}{\Delta \mu}m_{j-1}^{k+1} + \bigg(1 + \frac{\Delta t}{\Delta \mu} + \Delta t \gamma_j^k\bigg)m_j^{k+1} &= m_j^k \end{aligned} Writing \displaystyle \begin{aligned} d_{1,i}^k &= 1 + g^k\frac{\Delta t}{\Delta \mu} - \Delta t \sigma_i^k \\ d_{2,i}^k &= 1 + \frac{\Delta t}{\Delta \nu} - \Delta t \gamma_i^k \end{aligned} We can express the above in matrix form $\displaystyle \begin{bmatrix} g^k & 0 & 0 & \ldots & 0 & 0 \\ -g^k\frac{\Delta t}{\Delta \mu} & d_{1,1}^k & 0 & \ldots & 0 & 0\\ 0 & -g^k\frac{\Delta t}{\Delta \mu} & d_{1,2}^k & \ldots & 0 & 0 \\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ 0 & 0 & 0 & \ldots &\ -g^k\frac{\Delta t}{\Delta \mu} & d_{1,n_1}^k \\ \end{bmatrix} \begin{bmatrix} p_0^{k+1} \\ p_1^{k+1} \\ p_2^{k+1} \\ \ldots \\ p_{n_1}^{k+1} \end{bmatrix} = \begin{bmatrix} \phi^k E^k \\ p_1^k \\ p_2^k \\ \ldots \\ p_{n_1}^k \\ \end{bmatrix}$ $\displaystyle \begin{bmatrix} 1 & 0 & 0 & \ldots & 0 & 0 \\ -\frac{\Delta t}{\Delta \mu} & d_{2,1}^k & 0 & \ldots & 0 & 0\\ 0 & -\frac{\Delta t}{\Delta \mu} & d_{2,2}^k & \ldots & 0 & 0 \\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ 0 & 0 & 0 & \ldots &\ -\frac{\Delta t}{\Delta \mu} & d_{2,n_1}^k \\ \end{bmatrix} \begin{bmatrix} m_0^{k+1} \\ m_1^{k+1} \\ m_2^{k+1} \\ \ldots \\ m_{n_2}^{k+1} \end{bmatrix} = \begin{bmatrix} g^k p_{n_1}^{k+1} \\ m_1^k \\ m_2^k \\ \ldots \\ m_{n_1}^k \\ \end{bmatrix}$ Finally we can write $\displaystyle E^{k+1} = \frac{E^k + \Delta t f^k}{1 + a_E^k\Delta T}$ ## A Haskell Implementation > {-# OPTIONS_GHC -Wall #-} > {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE NoImplicitPrelude #-} > {-# LANGUAGE FlexibleContexts #-} > {-# LANGUAGE DataKinds #-} > {-# LANGUAGE TypeOperators #-} > module Tribbles where > import qualified Prelude as P > import Numeric.Units.Dimensional.Prelude hiding (Unit) > import Numeric.Units.Dimensional > import Numeric.LinearAlgebra > import Numeric.Integration.TanhSinh > import Control.Monad.Writer > import Control.Monad.Loops Substances like erythropoietin (EPO) are measured in International Units and these cannot be converted to Moles (see Jelkmann (2009) for much more detail) so we have to pretend it really is measured in Moles as there seems to be no easy way to define what the dimensional package calls a base dimension. A typical amount for a person is 15 milli-IU / mill-litre but can reach much higher levels after loss of blood. > muPerMl :: (Fractional a, Num a) => Unit 'NonMetric DConcentration a > muPerMl = (milli mole) / (milli litre) > bigE'0 :: Concentration Double > bigE'0 = 15.0 *~ muPerMl Let’s set up our grid. We take these from Ackleh et al. (2006) but note that J. J. Thibodeaux and Schlittenhardt (2011) seem to have $T = 20$. > deltaT, deltaMu, deltaNu :: Time Double > deltaT = 0.05 *~ day > deltaMu = 0.01 *~ day > deltaNu = 0.05 *~ day > bigT :: Time Double > bigT = 100.0 *~ day > muF, nuF :: Time Double > muF = 5.9 *~ day > nuF = 120.0 *~ day > bigK :: Int > bigK = floor (bigT / deltaT /~ one) > n1 :: Int > n1 = floor (muF / deltaMu /~ one) > n2 :: Int > n2 = floor (nuF / deltaNu /~ one) > ts :: [Time Double] > ts = take bigK 0.0 *~ day : (map (+ deltaT) ts)

The birth rate for precursors

> betaThibodeaux :: Time Double ->
>                   Frequency Double
> betaThibodeaux mu
>   | mu < (0 *~ day) = error "betaThibodeaux: negative age"
>   | mu < (3 *~ day) = (2.773 *~ (one / day))
>   | otherwise       = (0.0 *~ (one /day))
> alphaThibodeaux :: Concentration Double ->
>                    Frequency Double
> alphaThibodeaux e = (0.5 *~ (muPerMl / day)) / ((1 *~ muPerMl) + e)
> sigmaThibodeaux :: Time Double ->
>                    Time Double ->
>                    Concentration Double ->
>                    Frequency Double
> sigmaThibodeaux mu _t e = gThibodeaux e * (betaThibodeaux mu - alphaThibodeaux e)

and an alternative birth rate

> betaAckleh :: Time Double -> Frequency Double
> betaAckleh mu
>   | mu < (0 *~ day) = error "betaAckleh: negative age"
>   | mu < (3 *~ day) = 2.773 *~ (one / day)
>   | otherwise       = 0.000 *~ (one / day)
> sigmaAckleh :: Time Double ->
>                Time Double ->
>                Concentration Double ->
>                Frequency Double
> sigmaAckleh mu _t e = betaAckleh mu * gAckleh e

J. J. Thibodeaux and Schlittenhardt (2011) give the maturation rate of precursors $g$ as

> gThibodeaux :: Concentration Double  -> Dimensionless Double
> gThibodeaux e = d / n
>   where
>     n = ((3.02 *~ one) * e + (0.31 *~ muPerMl))
>     d = (30.61 *~ muPerMl) + e

and Ackleh et al. (2006) give this as

> gAckleh :: Concentration Double -> Dimensionless Double
> gAckleh _e = 1.0 *~ one

As in J. J. Thibodeaux and Schlittenhardt (2011) we give quantities in terms of cells per kilogram of body weight. Note that these really are moles on this occasion.

> type CellDensity = Quantity (DAmountOfSubstance / DTime / DMass)

Let’s set the initial conditions.

> p'0 :: Time Double -> CellDensity Double
> p'0 mu' = (1e11 *~ one) * pAux mu'
>   where
>     pAux mu
>       | mu < (0 *~ day) = error "p'0: negative age"
>       | mu < (3 *~ day) = 8.55e-6 *~ (mole / day / kilo gram) *
>                           exp ((2.7519 *~ (one / day)) * mu)
>       | otherwise       = 8.55e-6 *~ (mole / day / kilo gram) *
>                           exp (8.319 *~ one - (0.0211 *~ (one / day)) * mu)
> m_0 :: Time Double -> CellDensity Double
> m_0 nu' = (1e11 *~ one) * mAux nu'
>   where
>     mAux nu
>       | nu < (0 *~ day) = error "m_0: age less than zero"
>       | otherwise       = 0.039827  *~ (mole / day / kilo gram) *
>                           exp (((-0.0083) *~ (one / day)) * nu)

And check that these give plausible results.

> m_0Untyped :: Double -> Double
> m_0Untyped nu = m_0 (nu *~ day) /~ (mole / day / kilo gram)
> p'0Untyped :: Double -> Double
> p'0Untyped mu = p'0 (mu *~ day) /~ (mole / day / kilo gram)
ghci> import Numeric.Integration.TanhSinh
ghci> result $relative 1e-6$ parTrap m_0Untyped 0.001 (nuF /~ day)
3.0260736659043414e11

ghci> result $relative 1e-6$ parTrap p'0Untyped 0.001 (muF /~ day)
1.0453999900927126e10

We can now create the components for the first matrix equation.

> g'0 :: Dimensionless Double
> g'0 = gThibodeaux bigE'0
> d_1'0 :: Int -> Dimensionless Double
> d_1'0 i = (1 *~ one) + (g'0 * deltaT / deltaMu)
>           - deltaT * sigmaThibodeaux ((fromIntegral i *~ one) * deltaMu) undefined bigE'0
> lowers :: [Dimensionless Double]
> lowers = replicate n1 (negate $g'0 * deltaT / deltaMu) > diags :: [Dimensionless Double] > diags = g'0 : map d_1'0 [1..n1] > uppers :: [Dimensionless Double] > uppers = replicate n1 (0.0 *~ one) J. J. Thibodeaux and Schlittenhardt (2011) does not give a definition for $\phi$ so we use the equivalent $s_0$ from Ackleh et al. (2006) which references Banks et al. (2003): “$\times 10^{11}$ erythrocytes/kg body weight $\times$ mL plasma/mU Epo/day” > s_0 :: Time Double -> > Quantity (DAmountOfSubstance / DTime / DMass / DConcentration) Double > s_0 = const ((1e11 *~ one) * (4.45e-7 *~ (mole / day / kilo gram / muPerMl))) > b'0 :: [CellDensity Double] > b'0 = (s_0 (0.0 *~ day) * bigE'0) : (take n1$ map p'0 (iterate (+ deltaMu) deltaMu))

With these components in place we can now solve the implicit scheme and get the age distribution of precursors after one time step.

> p'1 :: Matrix Double
> p'1 = triDiagSolve (fromList (map (/~ one) lowers))
>                    (fromList (map (/~ one) diags))
>                    (fromList (map (/~ one) uppers))
>                    (((n1 P.+1 )><1) (map (/~ (mole / second / kilo gram)) b'0))

In order to create the components for the second matrix equation, we need the death rates of mature erythrocytes

> gammaThibodeaux :: Time Double ->
>                    Time Double ->
>                    Quantity (DAmountOfSubstance / DMass) Double ->
>                    Frequency Double
> gammaThibodeaux _nu _t _bigM = 0.0083 *~ (one / day)

We note an alternative for the death rate

> gammaAckleh :: Time Double ->
>                Time Double ->
>                Quantity (DAmountOfSubstance / DMass) Double ->
>                Frequency Double
> gammaAckleh _nu _t bigM = (0.01 *~ (kilo gram / mole / day)) * bigM + 0.0001 *~ (one / day)

For the intial mature erythrocyte population we can either use the integral of the initial distribution

> bigM'0 :: Quantity (DAmountOfSubstance / DMass) Double
> bigM'0 = r *~ (mole / kilo gram)
>  where
>    r = result $relative 1e-6$ parTrap m_0Untyped 0.001 (nuF /~ day)
ghci> bigM'0
3.0260736659043414e11 kg^-1 mol

or we can use the sum of the values used in the finite difference approximation

> bigM'0' :: Quantity (DAmountOfSubstance / DMass) Double
> bigM'0' = (* deltaNu) $sum$ map m_0 $take n2$ iterate (+ deltaNu) (0.0 *~ day)
ghci> bigM'0'
3.026741454719976e11 kg^-1 mol

Finally we can create the components

> d_2'0 :: Int -> Dimensionless Double
> d_2'0 i = (1 *~ one) + (g'0 * deltaT / deltaNu)
>           + deltaT * gammaThibodeaux ((fromIntegral i *~ one) * deltaNu) undefined bigM'0
> lowers2 :: [Dimensionless Double]
> lowers2 = replicate n2 (negate $deltaT / deltaNu) > diags2 :: [Dimensionless Double] > diags2 = (1.0 *~ one) : map d_2'0 [1..n2] > uppers2 :: [Dimensionless Double] > uppers2 = replicate n2 (0.0 *~ one) > b_2'0 :: [CellDensity Double] > b_2'0 = (g'0 * ((p'1 atIndex (n1,0)) *~ (mole / second / kilo gram))) : > (take n2$ map m_0 (iterate (+ deltaNu) deltaNu))

and then solve the implicit scheme to get the distribution of mature erythrocytes one time step ahead

> m'1 :: Matrix Double
> m'1 = triDiagSolve (fromList (map (/~ one) lowers2))
>                    (fromList (map (/~ one) diags2))
>                    (fromList (map (/~ one) uppers2))
>                    (((n2 P.+ 1)><1) (map (/~ (mole / second / kilo gram)) b_2'0))

We need to complete the homeostatic loop by implmenting the feedback from the kidneys to the bone marrow. Ackleh and Thibodeaux (2013) and Ackleh et al. (2006) give $f$ as

> fAckleh :: Time Double ->
>            Quantity (DAmountOfSubstance / DMass) Double ->
>            Quantity (DConcentration / DTime) Double
> fAckleh _t bigM = a / ((1.0 *~ one) + k * (bigM' ** r))
>   where
>     a = 15600 *~ (muPerMl / day)
>     k = 0.0382 *~ one
>     r = 6.96 *~ one
>     bigM' = ((bigM /~ (mole / kilo gram)) *~ one) * (1e-11 *~ one)

The much older Belair, Mackey, and Mahaffy (1995) gives $f$ as

> fBelair :: Time Double ->
>            Quantity (DAmountOfSubstance / DMass) Double ->
>            Quantity (DConcentration / DTime) Double
> fBelair _t bigM = a / ((1.0 *~ one) + k * (bigM' ** r))
>   where
>     a = 6570 *~ (muPerMl / day)
>     k = 0.0382 *~ one
>     r = 6.96 *~ one
>     bigM' = ((bigM /~ (mole / kilo gram)) *~ one) * (1e-11 *~ one)

For the intial precursor population we can either use the integral of the initial distribution

result $relative 1e-6$ parTrap p'0Untyped 0.001 (muF /~ day)
> bigP'0 :: Quantity (DAmountOfSubstance / DMass) Double
> bigP'0 = r *~ (mole / kilo gram)
>  where
>    r = result $relative 1e-6$ parTrap p'0Untyped 0.001 (muF /~ day)
ghci> bigP'0
1.0453999900927126e10 kg^-1 mol

or we can use the sum of the values used in the finite difference approximation

> bigP'0' :: Quantity (DAmountOfSubstance / DMass) Double
> bigP'0' = (* deltaMu) $sum$ map p'0 $take n1$ iterate (+ deltaMu) (0.0 *~ day)
ghci> bigP'0'
1.0438999930030743e10 kg^-1 mol

J. J. Thibodeaux and Schlittenhardt (2011) give the following for $a_E$

> a_E :: Quantity (DAmountOfSubstance / DMass) Double -> Frequency Double
> a_E bigP = ((n / d) /~ one) *~ (one / day)
>   where
>     n :: Dimensionless Double
>     n = bigP * (13.8 *~ (kilo gram / mole)) + 0.04 *~ one
>     d :: Dimensionless Double
>     d = (bigP /~ (mole / kilo gram)) *~ one + 0.08 *~ one

but from Ackleh et al. (2006)

The only biological basis for the latter is that the decay rate of erythropoietin should be an increasing function of the precursor population and this function remains in the range 0.50–6.65 $\mathrm{days}^{-1}$

and, given this is at variance with their given function, it may be safer to use their alternative of

> a_E' :: Quantity (DAmountOfSubstance / DMass) Double -> Frequency Double
> a_E' _bigP = 6.65 *~ (one / day)

We now further calculate the concentration of EPO one time step ahead.

> f'0 :: Quantity (DConcentration / DTime) Double
> f'0 = fAckleh undefined bigM'0
> bigE'1 :: Concentration Double
> bigE'1 = (bigE'0 + deltaT * f'0) / (1.0 *~ one + deltaT * a_E' bigP'0)

Having done this for one time step starting at $t=0$, it’s easy to generalize this to an arbitrary time step.

> d_1 :: Dimensionless Double ->
>        Concentration Double ->
>        Int ->
>        Dimensionless Double
> d_1 g e i = (1 *~ one) + (g * deltaT / deltaMu)
>           - deltaT * sigmaThibodeaux ((fromIntegral i *~ one) * deltaMu) undefined e
> d_2 :: Quantity (DAmountOfSubstance / DMass) Double ->
>        Int ->
>        Dimensionless Double
> d_2 bigM i = (1 *~ one) + deltaT / deltaNu
>            + deltaT * gammaThibodeaux ((fromIntegral i *~ one) * deltaNu) undefined bigM
> oneStepM :: (Matrix Double, Matrix Double, Concentration Double, Time Double) ->
>             Writer [(Quantity (DAmountOfSubstance / DMass) Double,
>                      Quantity (DAmountOfSubstance / DMass) Double,
>                      Concentration Double)]
>                    (Matrix Double, Matrix Double, Concentration Double, Time Double)
> oneStepM (psPrev, msPrev, ePrev, tPrev) = do
>   let
>     g  = gThibodeaux ePrev
>     ls = replicate n1 (negate $g * deltaT / deltaMu) > ds = g : map (d_1 g ePrev) [1..n1] > us = replicate n1 (0.0 *~ one) > b1'0 = (s_0 tPrev * ePrev) /~ (mole / second / kilo gram) > b1 = asColumn$ vjoin [scalar b1'0, subVector 1 n1 $flatten psPrev] > psNew :: Matrix Double > psNew = triDiagSolve (fromList (map (/~ one) ls)) > (fromList (map (/~ one) ds)) > (fromList (map (/~ one) us)) > b1 > ls2 = replicate n2 (negate$ deltaT / deltaNu)
>     bigM :: Quantity (DAmountOfSubstance / DMass) Double
>     bigM = (* deltaNu) $((sumElements msPrev) *~ (mole / kilo gram / second)) > ds2 = (1.0 *~ one) : map (d_2 bigM) [1..n2] > us2 = replicate n2 (0.0 *~ one) > b2'0 = (g * (psNew atIndex (n1, 0) *~ (mole / second / kilo gram))) /~ > (mole / second / kilo gram) > b2 = asColumn$ vjoin [scalar b2'0, subVector 1 n2 $flatten msPrev] > msNew :: Matrix Double > msNew = triDiagSolve (fromList (map (/~ one) ls2)) > (fromList (map (/~ one) ds2)) > (fromList (map (/~ one) us2)) > b2 > bigP :: Quantity (DAmountOfSubstance / DMass) Double > bigP = (* deltaMu)$ sumElements psPrev *~ (mole / kilo gram / second)
>     f :: Quantity (DConcentration / DTime) Double
>     f = fAckleh undefined bigM
>     eNew :: Concentration Double
>     eNew = (ePrev + deltaT * f) / (1.0 *~ one + deltaT * a_E' bigP)
>     tNew = tPrev + deltaT
>   tell [(bigP, bigM, ePrev)]
>   return (psNew, msNew, eNew, tNew)

We can now run the model for 100 days.

> ys :: [(Quantity (DAmountOfSubstance / DMass) Double,
>         Quantity (DAmountOfSubstance / DMass) Double,
>         Concentration Double)]
> ys = take 2000 $> snd$
>      runWriter $> iterateM_ oneStepM ((((n1 P.+1 )><1) (map (/~ (mole / second / kilo gram)) b'0)), > (((n2 P.+ 1)><1)$ (map (/~ (mole / second / kilo gram)) b_2'0)),
>                          bigE'0,
>                          (0.0 *~ day))

And now we can plot what happens for a period of 100 days.

# References

Ackleh, Azmy S., and Jeremy J. Thibodeaux. 2013. “A second-order finite difference approximation for a mathematical model of erythropoiesis.” Numerical Methods for Partial Differential Equations, no. November: n/a–n/a. doi:10.1002/num.21778.

Ackleh, Azmy S., Keng Deng, Kazufumi Ito, and Jeremy Thibodeaux. 2006. “A Structured Erythropoiesis Model with Nonlinear Cell Maturation Velocity and Hormone Decay Rate.” Mathematical Biosciences 204 (1): 21–48. doi:http://dx.doi.org/10.1016/j.mbs.2006.08.004.

Banks, H T, Cammey E Cole, Paul M Schlosser, and Hien T Tran. 2003. “Modeling and Optimal Regulation of Erythropoiesis Subject to Benzene Intoxication.” https://www.ncsu.edu/crsc/reports/ftp/pdf/crsc-tr03-49.pdf.

Belair, Jacques, Michael C. Mackey, and Joseph M. Mahaffy. 1995. “Age-Structured and Two-Delay Models for Erythropoiesis.” Mathematical Biosciences 128 (1): 317–46. doi:http://dx.doi.org/10.1016/0025-5564(94)00078-E.

Gurtin, Morton E, and Richard C MacCamy. 1974. “Non-Linear Age-Dependent Population Dynamics.” Archive for Rational Mechanics and Analysis 54 (3). Springer: 281–300.

Hall, Adam J, Michael J Chappell, John AD Aston, and Stephen A Ward. 2013. “Pharmacokinetic Modelling of the Anti-Malarial Drug Artesunate and Its Active Metabolite Dihydroartemisinin.” Computer Methods and Programs in Biomedicine 112 (1). Elsevier: 1–15.

Jelkmann, Wolfgang. 2009. “Efficacy of Recombinant Erythropoietins: Is There Unity of International Units?” Nephrology Dialysis Transplantation 24 (5): 1366. doi:10.1093/ndt/gfp058.

Sawyer, Stephen T, SB Krantz, and E Goldwasser. 1987. “Binding and Receptor-Mediated Endocytosis of Erythropoietin in Friend Virus-Infected Erythroid Cells.” Journal of Biological Chemistry 262 (12). ASBMB: 5554–62.

Thibodeaux, Jeremy J., and Timothy P. Schlittenhardt. 2011. “Optimal Treatment Strategies for Malaria Infection.” Bulletin of Mathematical Biology 73 (11): 2791–2808. doi:10.1007/s11538-011-9650-8.

Torbett, Bruce E., and Jeffrey S. Friedman. 2009. “Erythropoiesis: An Overview.” In Erythropoietins, Erythropoietic Factors, and Erythropoiesis: Molecular, Cellular, Preclinical, and Clinical Biology, edited by Steven G. Elliott, Mary Ann Foote, and Graham Molineux, 3–18. Basel: Birkhäuser Basel. doi:10.1007/978-3-7643-8698-6_1.

# Introduction

In the 1920s, Lotka (1909) and Volterra (1926) developed a model of a very simple predator-prey ecosystem.

\displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 - c_1 N_1 N_2 \label{eq2a} \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & c_2 N_1 N_2 - \rho_2 N2 \label{eq2b} \end{aligned}

Although simple, it turns out that the Canadian lynx and showshoe hare are well represented by such a model. Furthermore, the Hudson Bay Company kept records of how many pelts of each species were trapped for almost a century, giving a good proxy of the population of each species.

We can capture the fact that we do not have a complete model by describing our state of ignorance about the parameters. In order to keep this as simple as possible let us assume that log parameters undergo Brownian motion. That is, we know the parameters will jiggle around and the further into the future we look the less certain we are about what values they will have taken. By making the log parameters undergo Brownian motion, we can also capture our modelling assumption that birth, death and predation rates are always positive. A similar approach is taken in Dureau, Kalogeropoulos, and Baguelin (2013) where the (log) parameters of an epidemiological model are taken to be Ornstein-Uhlenbeck processes (which is biologically more plausible although adds to the complexity of the model, something we wish to avoid in an example such as this).

Andrieu, Doucet, and Holenstein (2010) propose a method to estimate the parameters of such models (Particle Marginal Metropolis Hastings aka PMMH) and the domain specific probabilistic language LibBi (Murray (n.d.)) can be used to apply this (and other inference methods).

For the sake of simplicity, in this blog post, we only model one parameter as being unknown and undergoing Brownian motion. A future blog post will consider more sophisticated scenarios.

# A Dynamical System Aside

The above dynamical system is structurally unstable (more on this in a future post), a possible indication that it should not be considered as a good model of predator–prey interaction. Let us modify this to include carrying capacities for the populations of both species.

\displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 \bigg(1 - \frac{N_1}{K_1}\bigg) - c_1 N_1 N_2 \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & -\rho_2 N_2 \bigg(1 + \frac{N_2}{K_2}\bigg) + c_2 N_1 N_2 \end{aligned}

# Data Generation with LibBi

Let’s generate some data using LibBi.

// Generate data assuming a fixed growth rate for hares rather than
// e.g. a growth rate that undergoes Brownian motion.

model PP {
const h         = 0.1;    // time step
const delta_abs = 1.0e-3; // absolute error tolerance
const delta_rel = 1.0e-6; // relative error tolerance

const a  = 5.0e-1 // Hare growth rate
const k1 = 2.0e2  // Hare carrying capacity
const b  = 2.0e-2 // Hare death rate per lynx
const d  = 4.0e-1 // Lynx death rate
const k2 = 2.0e1  // Lynx carrying capacity
const c  = 4.0e-3 // Lynx birth rate per hare

state P, Z       // Hares and lynxes
state ln_alpha   // Hare growth rate - we express it in log form for
// consistency with the inference model
obs P_obs        // Observations of hares

sub initial {
P ~ log_normal(log(100.0), 0.2)
Z ~ log_normal(log(50.0), 0.1)
}

sub transition(delta = h) {
ode(h = h, atoler = delta_abs, rtoler = delta_rel, alg = 'RK4(3)') {
dP/dt =  a * P * (1 - P / k1) - b * P * Z
dZ/dt = -d * Z * (1 + Z / k2) + c * P * Z
}
}

sub observation {
P_obs ~ log_normal(log(P), 0.1)
}
}

We can look at phase space starting with different populations and see they all converge to the same fixed point.

Since at some point in the future, I plan to produce Haskell versions of the methods given in Andrieu, Doucet, and Holenstein (2010), let’s generate the data using Haskell.

> {-# OPTIONS_GHC -Wall                     #-}
> module LotkaVolterra (
>     solLv
>   , solPp
>   , h0
>   , l0
>   , baz
>   , logBM
>   , eulerEx
>   )where
> import Numeric.GSL.ODE
> import Numeric.LinearAlgebra
> import Data.Random.Source.PureMT
> import Data.Random hiding ( gamma )

Here’s the unstable model.

> lvOde :: Double ->
>          Double ->
>          Double ->
>          Double ->
>          Double ->
>          [Double] ->
>          [Double]
> lvOde rho1 c1 rho2 c2 _t [h, l] =
>   [
>     rho1 * h - c1 * h * l
>   , c2 * h * l - rho2 * l
>   ]
> lvOde _rho1 _c1 _rho2 _c2 _t vars =
>   error $"lvOde called with: " ++ show (length vars) ++ " variable" > rho1, c1, rho2, c2 :: Double > rho1 = 0.5 > c1 = 0.02 > rho2 = 0.4 > c2 = 0.004 > deltaT :: Double > deltaT = 0.1 > solLv :: Matrix Double > solLv = odeSolve (lvOde rho1 c1 rho2 c2) > [50.0, 50.0] > (fromList [0.0, deltaT .. 50]) And here’s the stable model. > ppOde :: Double -> > Double -> > Double -> > Double -> > Double -> > Double -> > Double -> > [Double] -> > [Double] > ppOde a k1 b d k2 c _t [p, z] = > [ > a * p * (1 - p / k1) - b * p * z > , -d * z * (1 + z / k2) + c * p * z > ] > ppOde _a _k1 _b _d _k2 _c _t vars = > error$ "ppOde called with: " ++ show (length vars) ++ " variable"
> a, k1, b, d, k2, c :: Double
> a = 0.5
> k1 = 200.0
> b = 0.02
> d = 0.4
> k2 = 50.0
> c = 0.004
> solPp :: Double -> Double -> Matrix Double
> solPp x y = odeSolve (ppOde a k1 b d k2 c)
>                  [x, y]
>                  (fromList [0.0, deltaT .. 50])
> gamma, alpha, beta :: Double
> gamma = d / a
> alpha = a / (c * k1)
> beta  = d / (a * k2)
> fp :: (Double, Double)
> fp = ((gamma + beta) / (1 + alpha * beta), (1 - gamma * alpha) / (1 + alpha * beta))
> h0, l0 :: Double
> h0 = a * fst fp / c
> l0 = a * snd fp / b
> foo, bar :: Matrix R
> foo = matrix 2 [a / k1, b, c, negate d / k2]
> bar = matrix 1 [a, d]
> baz :: Maybe (Matrix R)
> baz = linearSolve foo bar

This gives a stable fixed point of

ghci> baz
Just (2><1)
[ 120.00000000000001
,               10.0 ]

Here’s an example of convergence to that fixed point in phase space.

## The Stochastic Model

Let us now assume that the Hare growth parameter undergoes Brownian motion so that the further into the future we go, the less certain we are about it. In order to ensure that this parameter remains positive, let’s model the log of it to be Brownian motion.

\displaystyle \begin{aligned} \frac{\mathrm{d}N_1}{\mathrm{d}t} & = & \rho_1 N_1 \bigg(1 - \frac{N_1}{K_1}\bigg) - c_1 N_1 N_2 \\ \frac{\mathrm{d}N_2}{\mathrm{d}t} & = & -\rho_2 N_2 \bigg(1 + \frac{N_2}{K_2}\bigg) + c_2 N_1 N_2 \\ \mathrm{d} \rho_1 & = & \rho_1 \sigma_{\rho_1} \mathrm{d}W_t \end{aligned}

where the final equation is a stochastic differential equation with $W_t$ being a Wiener process.

By Itô we have

$\displaystyle \mathrm{d} (\log{\rho_1}) = - \frac{\sigma_{\rho_1}^2}{2} \mathrm{d} t + \sigma_{\rho_1} \mathrm{d}W_t$

We can use this to generate paths for $\rho_1$.

$\displaystyle \rho_1(t + \Delta t) = \rho_1(t)\exp{\bigg(- \frac{\sigma_{\rho_1}^2}{2} \Delta t + \sigma_{\rho_1} \sqrt{\Delta t} Z\bigg)}$

where $Z \sim {\mathcal{N}}(0,1)$.

> oneStepLogBM :: MonadRandom m => Double -> Double -> Double -> m Double
> oneStepLogBM deltaT sigma rhoPrev = do
>   x <- sample $rvar StdNormal > return$ rhoPrev * exp(sigma * (sqrt deltaT) * x - 0.5 * sigma * sigma * deltaT)
> iterateM :: Monad m => (a -> m a) -> m a -> Int -> m [a]
> iterateM f mx n = sequence . take n . iterate (>>= f) $mx > logBMM :: MonadRandom m => Double -> Double -> Int -> Int -> m [Double] > logBMM initRho sigma n m = > iterateM (oneStepLogBM (recip$ fromIntegral n) sigma) (return initRho) (n * m)
> logBM :: Double -> Double -> Int -> Int -> Int -> [Double]
> logBM initRho sigma n m seed =
>   evalState (logBMM initRho sigma n m) (pureMT $fromIntegral seed) We can see the further we go into the future the less certain we are about the value of the parameter. Using this we can simulate the whole dynamical system which is now a stochastic process. > f1, f2 :: Double -> Double -> Double -> > Double -> Double -> > Double > f1 a k1 b p z = a * p * (1 - p / k1) - b * p * z > f2 d k2 c p z = -d * z * (1 + z / k2) + c * p * z > oneStepEuler :: MonadRandom m => > Double -> > Double -> > Double -> Double -> > Double -> Double -> Double -> > (Double, Double, Double) -> > m (Double, Double, Double) > oneStepEuler deltaT sigma k1 b d k2 c (rho1Prev, pPrev, zPrev) = do > let pNew = pPrev + deltaT * f1 (exp rho1Prev) k1 b pPrev zPrev > let zNew = zPrev + deltaT * f2 d k2 c pPrev zPrev > rho1New <- oneStepLogBM deltaT sigma rho1Prev > return (rho1New, pNew, zNew) > euler :: MonadRandom m => > (Double, Double, Double) -> > Double -> > Double -> Double -> > Double -> Double -> Double -> > Int -> Int -> > m [(Double, Double, Double)] > euler stateInit sigma k1 b d k2 c n m = > iterateM (oneStepEuler (recip$ fromIntegral n) sigma k1 b d k2 c)
>            (return stateInit)
>            (n * m)
> eulerEx :: (Double, Double, Double) ->
>            Double -> Int -> Int -> Int ->
>            [(Double, Double, Double)]
> eulerEx stateInit sigma n m seed =
>   evalState (euler stateInit sigma k1 b d k2 c n m) (pureMT $fromIntegral seed) We see that the populations become noisier the further into the future we go. Notice that the second order effects of the system are now to some extent captured by the fact that the growth rate of Hares can drift. In our simulation, this is demonstrated by our decreasing lack of knowledge the further we look into the future. # Inference Now let us infer the growth rate using PMMH. Here’s the model expressed in LibBi. // Infer growth rate for hares model PP { const h = 0.1; // time step const delta_abs = 1.0e-3; // absolute error tolerance const delta_rel = 1.0e-6; // relative error tolerance const a = 5.0e-1 // Hare growth rate - superfluous for inference // but a reminder of what we should expect const k1 = 2.0e2 // Hare carrying capacity const b = 2.0e-2 // Hare death rate per lynx const d = 4.0e-1 // Lynx death rate const k2 = 2.0e1 // Lynx carrying capacity const c = 4.0e-3 // Lynx birth rate per hare state P, Z // Hares and lynxes state ln_alpha // Hare growth rate - we express it in log form for // consistency with the inference model obs P_obs // Observations of hares param mu, sigma // Mean and standard deviation of hare growth rate noise w // Noise sub parameter { mu ~ uniform(0.0, 1.0) sigma ~ uniform(0.0, 0.5) } sub proposal_parameter { mu ~ truncated_gaussian(mu, 0.02, 0.0, 1.0); sigma ~ truncated_gaussian(sigma, 0.01, 0.0, 0.5); } sub initial { P ~ log_normal(log(100.0), 0.2) Z ~ log_normal(log(50.0), 0.1) ln_alpha ~ gaussian(log(mu), sigma) } sub transition(delta = h) { w ~ normal(0.0, sqrt(h)); ode(h = h, atoler = delta_abs, rtoler = delta_rel, alg = 'RK4(3)') { dP/dt = exp(ln_alpha) * P * (1 - P / k1) - b * P * Z dZ/dt = -d * Z * (1 + Z / k2) + c * P * Z dln_alpha/dt = -sigma * sigma / 2 - sigma * w / h } } sub observation { P_obs ~ log_normal(log(P), 0.1) } } Let’s look at the posteriors of the hyper-parameters for the Hare growth parameter. The estimate for $\mu$ is pretty decent. For our generated data, $\sigma =0$ and given our observations are quite noisy maybe the estimate for this is not too bad also. # Appendix: The R Driving Code All code including the R below can be downloaded from github but make sure you use the straight-libbi branch and not master. install.packages("devtools") library(devtools) install_github("sbfnk/RBi",ref="master") install_github("sbfnk/RBi.helpers",ref="master") rm(list = ls(all.names=TRUE)) unlink(".RData") library('RBi') try(detach(package:RBi, unload = TRUE), silent = TRUE) library(RBi, quietly = TRUE) library('RBi.helpers') library('ggplot2', quietly = TRUE) library('gridExtra', quietly = TRUE) endTime <- 50 PP <- bi_model("PP.bi") synthetic_dataset_PP <- bi_generate_dataset(endtime=endTime, model=PP, seed="42", verbose=TRUE, add_options = list( noutputs=500)) rdata_PP <- bi_read(synthetic_dataset_PP) df <- data.frame(rdata_PP$P$nr, rdata_PP$P$value, rdata_PP$Z$value, rdata_PP$P_obs$value) ggplot(df, aes(rdata_PP$P$nr, y = Population, color = variable), size = 0.1) + geom_line(aes(y = rdata_PP$P$value, col = "Hare"), size = 0.1) + geom_line(aes(y = rdata_PP$Z$value, col = "Lynx"), size = 0.1) + geom_point(aes(y = rdata_PP$P_obs$value, col = "Observations"), size = 0.1) + theme(legend.position="none") + ggtitle("Example Data") + xlab("Days") + theme(axis.text=element_text(size=4), axis.title=element_text(size=6,face="bold")) + theme(plot.title = element_text(size=10)) ggsave(filename="diagrams/LVdata.png",width=4,height=3) synthetic_dataset_PP1 <- bi_generate_dataset(endtime=endTime, model=PP, init = list(P = 100, Z=50), seed="42", verbose=TRUE, add_options = list( noutputs=500)) rdata_PP1 <- bi_read(synthetic_dataset_PP1) synthetic_dataset_PP2 <- bi_generate_dataset(endtime=endTime, model=PP, init = list(P = 150, Z=25), seed="42", verbose=TRUE, add_options = list( noutputs=500)) rdata_PP2 <- bi_read(synthetic_dataset_PP2) df1 <- data.frame(hare = rdata_PP$P$value, lynx = rdata_PP$Z$value, hare1 = rdata_PP1$P$value, lynx1 = rdata_PP1$Z$value, hare2 = rdata_PP2$P$value, lynx2 = rdata_PP2$Z$value) ggplot(df1) + geom_path(aes(x=df1$hare,  y=df1$lynx, col = "0"), size = 0.1) + geom_path(aes(x=df1$hare1, y=df1$lynx1, col = "1"), size = 0.1) + geom_path(aes(x=df1$hare2, y=df1$lynx2, col = "2"), size = 0.1) + theme(legend.position="none") + ggtitle("Phase Space") + xlab("Hare") + ylab("Lynx") + theme(axis.text=element_text(size=4), axis.title=element_text(size=6,face="bold")) + theme(plot.title = element_text(size=10)) ggsave(filename="diagrams/PPviaLibBi.png",width=4,height=3) PPInfer <- bi_model("PPInfer.bi") bi_object_PP <- libbi(client="sample", model=PPInfer, obs = synthetic_dataset_PP) bi_object_PP$run(add_options = list(
"end-time" = endTime,
noutputs = endTime,
nsamples = 4000,
nparticles = 128,
seed=42,
## verbose = TRUE,
stdoutput_file_name = tempfile(pattern="pmmhoutput", fileext=".txt"))

bi_file_summary(bi_object_PP$result$output_file_name)

mu <- bi_read(bi_object_PP, "mu")$value g1 <- qplot(x = mu[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(mu)) sigma <- bi_read(bi_object_PP, "sigma")$value
g2 <- qplot(x = sigma[2001:4000], y = ..density.., geom = "histogram") + xlab(expression(sigma))
g3 <- grid.arrange(g1, g2)
ggsave(plot=g3,filename="diagrams/LvPosterior.png",width=4,height=3)

df2 <- data.frame(hareActs = rdata_PP$P$value,
hareObs  = rdata_PP$P_obs$value)

ggplot(df, aes(rdata_PP$P$nr, y = value, color = variable)) +
geom_line(aes(y = rdata_PP$P$value, col = "Phyto")) +
geom_line(aes(y = rdata_PP$Z$value, col = "Zoo")) +
geom_point(aes(y = rdata_PP$P_obs$value, col = "Phyto Obs"))

ln_alpha <- bi_read(bi_object_PP, "ln_alpha")$value P <- matrix(bi_read(bi_object_PP, "P")$value,nrow=51,byrow=TRUE)
0.0

# Introduction

Suppose we have a square thin plate of metal and we hold each of edges at a temperature which may vary along the edge but is fixed for all time. After some period depending on the conductivity of the metal, the temperature at every point on the plate will have stabilised. What is the temperature at any point?

We can calculate this using by solving Laplace’s equation $\nabla^2 \phi = 0$ in 2 dimensions. Apart from the preceeding motivation, a more compelling reason for doing so is that it is a moderately simple equation, in so far as partial differential equations are simple, that has been well studied for centuries.

In Haskell terms this gives us the opportunity to use the repa library and use hmatrix which is based on Lapack (as well as other libraries) albeit hmatrix only for illustratative purposes.

I had originally intended this blog to contain a comparison repa’s performance against an equivalent C program even though this has already been undertaken by the repa team in their various publications. And indeed it is still my intention to produce such a comparision. However, as I investigated further, it turned out a fair amount of comparison work has already been done by a team from Intel which suggests there is currently a performance gap but one which is not so large that it outweighs the other benefits of Haskell.

To be more specific, one way in which using repa stands out from the equivalent C implementation is that it gives a language in which we can specify the stencil being used to solve the equation. As an illustration we substitute the nine point method for the five point method merely by changing the stencil.

## A Motivating Example: The Steady State Heat Equation

Fourier’s law states that the rate of heat transfer or the flux $\boldsymbol{\sigma}$ is proportional to the negative temperature gradient, as heat flows from hot to cold, and further that it flows in the direction of greatest temperature change. We can write this as

$\displaystyle \boldsymbol{\sigma} = -k\nabla \phi$

where $\phi : \mathbb{R} \times \mathbb{R} \rightarrow \mathbb{R}$ is the temperature at any given point on the plate and $k$ is the conductivity of the metal.

Moreover, we know that for any region on the plate, the total amount of heat flowing in must be balanced by the amount of heat flowing out. We can write this as

$\displaystyle \nabla \cdot \boldsymbol{\sigma} = 0$

Substituting the first equation into the second we obtain Laplace’s equation

$\displaystyle \nabla^2 \phi = 0$

For example, suppose we hold the temperature of the edges of the plate as follows

$\displaystyle \begin{matrix} \phi(x, 0) = 1 & \phi(x, 1) = 2 & \phi(0, y) = 1 & \phi(1, y) = 2 \end{matrix}$

then after some time the temperature of the plate will be as shown in the heatmap below.

Notes:

1. Red is hot.

2. Blue is cold.

3. The heatmap is created by a finite difference method described below.

4. The $y$-axis points down (not up) i.e. $\phi(x,1)$ is at the bottom, reflecting the fact that we are using an array in the finite difference method and rows go down not up.

5. The corners are grey because in the five point finite difference method these play no part in determining temperatures in the interior of the plate.

# Colophon

Since the book I am writing contains C code (for performance comparisons), I need a way of being able to compile and run this code and include it “as is” in the book. Up until now, all my blog posts have contained Haskell and so I have been able to use BlogLiteratelyD which allows me to include really nice diagrams. But clearly this tool wasn’t really designed to handle other languages (although I am sure it could be made to do so).

Using pandoc’s scripting capability with the small script provided

import Text.Pandoc.JSON

doInclude :: Block -> IO Block
doInclude cb@(CodeBlock ("verbatim", classes, namevals) contents) =
case lookup "include" namevals of
Just f     -> return . (\x -> Para [Math DisplayMath x]) =<< readFile f
Nothing    -> return cb
doInclude cb@(CodeBlock (id, classes, namevals) contents) =
case lookup "include" namevals of
Just f     -> return . (CodeBlock (id, classes, namevals)) =<< readFile f
Nothing    -> return cb
doInclude x = return x

main :: IO ()
main = toJSONFilter doInclude

I can then include C code blocks like this

~~~~ {.c include="Chap1a.c"}
~~~~

And format the whole document like this

pandoc -s Chap1.lhs --filter=./Include -t markdown+lhs > Chap1Expanded.lhs
BlogLiteratelyD Chap1Expanded.lhs > Chap1.html

Sadly, the C doesn’t get syntax highlighting but this will do for now.

PS Sadly, WordPress doesn’t seem to be able to handle \color{red} and \color{blue} in LaTeX so there are some references to blue and red which do not render.

# Acknowledgements

A lot of the code for this post is taken from the repa package itself. Many thanks to the repa team for providing the package and the example code.

> {-# OPTIONS_GHC -Wall                      #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults    #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind   #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods  #-}
> {-# OPTIONS_GHC -fno-warn-orphans          #-}
> {-# LANGUAGE BangPatterns                  #-}
> {-# LANGUAGE QuasiQuotes                   #-}
> {-# LANGUAGE NoMonomorphismRestriction     #-}
> module Chap1 (
>     module Control.Applicative
>   , solveLaplaceStencil
>   , useBool
>   , boundValue
>   , bndFnEg1
>   , fivePoint
>   , ninePoint
>   , testStencil5
>   , testStencil9
>   , analyticValue
>   , slnHMat4
>   , slnHMat5
>   , testJacobi4
>   , testJacobi6
>   , bndFnEg3
>   , runSolver
>   , s5
>   , s9
>   ) where
>
> import Data.Array.Repa                   as R
> import Data.Array.Repa.Unsafe            as R
> import Data.Array.Repa.Stencil           as A
> import Data.Array.Repa.Stencil.Dim2      as A
> import Prelude                           as P
> import Data.Packed.Matrix
> import Numeric.LinearAlgebra.Algorithms
> import Chap1Aux
> import Control.Applicative

# Laplace’s Equation: The Five Point Formula

We show how to apply finite difference methods to Laplace’s equation:

$\displaystyle \nabla^2 u = 0$

where

$\displaystyle \nabla^2 = \frac{\partial^2}{\partial x^2} +\frac{\partial^2}{\partial y^2}$

For a sufficiently smooth function (see (Iserles 2009, chap. 8)) we have

\displaystyle \begin{aligned} \frac{\partial^2 u}{\partial x^2}\mathop{\Bigg|_{x = x_0 + k\Delta x}}_{y = y_0 + l\Delta x} &= \frac{1}{(\Delta x)^2}\Delta_{0,x}^2 u_{k,l} + \mathcal{O}((\Delta x)^2) \\ \frac{\partial^2 u}{\partial y^2}\mathop{\Bigg|_{x = x_0 + k\Delta x}}_{y = y_0 + l\Delta x} &= \frac{1}{(\Delta x)^2}\Delta_{0,y}^2 u_{k,l} + \mathcal{O}((\Delta x)^2) \end{aligned}

where the central difference operator $\Delta_0$ is defined as

$\displaystyle (\Delta_0 z)_k \triangleq z_{k + \frac{1}{2}} - z_{k - \frac{1}{2}}$

We are therefore led to consider the five point difference scheme.

$\displaystyle \frac{1}{(\Delta x)^2}(\Delta_{0,x}^2 + \Delta_{0,y}^2) u_{k,l} = 0$

We can re-write this explicitly as

$\displaystyle u_{k-1,l} + u_{k+1,l} + u_{k,l-1} + u_{k,l+1} - 4u_{k,l} = 0$

Specifically for the grid point (2,1) in a $4 \times 4$ grid we have

$\displaystyle {{u_{1,1}}} + {{u_{3,1}}} + {{u_{2,0}}} + {{u_{2,2}}} - 4{{u_{2,1}}} = 0$

where blue indicates that the point is an interior point and red indicates that the point is a boundary point. For Dirichlet boundary conditions (which is all we consider in this post), the values at the boundary points are known.

We can write the entire set of equations for this grid as

$\displaystyle \begin{bmatrix} -4.0 & 1.0 & 1.0 & 0.0 \\ 1.0 & -4.0 & 0.0 & 1.0 \\ 1.0 & 0.0 & -4.0 & 1.0 \\ 0.0 & 1.0 & 1.0 & -4.0 \end{bmatrix} \begin{bmatrix} {{u_{11}}} \\ {{u_{21}}} \\ {{u_{12}}} \\ {{u_{22}}} \end{bmatrix} = \begin{bmatrix} -{{u_{10}}} + -{{u_{01}}} \\ -{{u_{20}}} + -{{u_{31}}} \\ -{{u_{02}}} + -{{u_{13}}} \\ -{{u_{23}}} + -{{u_{32}}} \end{bmatrix}$

## A Very Simple Example

Let us take the boundary conditions to be

$\displaystyle \begin{matrix} u(x, 0) = 1 & u(x, 1) = 2 & u(0, y) = 1 & u(1, y) = 2 \end{matrix}$

With our $4 \times 4$ grid we can solve this exactly using the hmatrix package which has a binding to LAPACK.

First we create a $4 \times 4$ matrix in hmatrix form

> simpleEgN :: Int
> simpleEgN = 4 - 1
>
> matHMat4 :: IO (Matrix Double)
> matHMat4 = do
>   matRepa <- computeP $mkJacobiMat simpleEgN :: IO (Array U DIM2 Double) > return$ (simpleEgN - 1) >< (simpleEgN - 1) $toList matRepa ghci> matHMat4 (2><2) [ -4.0, 1.0 , 1.0, 0.0 ] Next we create the column vector as presribed by the boundary conditions > bndFnEg1 :: Int -> Int -> (Int, Int) -> Double > bndFnEg1 _ m (0, j) | j > 0 && j < m = 1.0 > bndFnEg1 n m (i, j) | i == n && j > 0 && j < m = 2.0 > bndFnEg1 n _ (i, 0) | i > 0 && i < n = 1.0 > bndFnEg1 n m (i, j) | j == m && i > 0 && i < n = 2.0 > bndFnEg1 _ _ _ = 0.0 > bnd1 :: Int -> [(Int, Int)] -> Double > bnd1 n = negate . > sum . > P.map (bndFnEg1 n n) > bndHMat4 :: Matrix Double > bndHMat4 = ((simpleEgN - 1) * (simpleEgN - 1)) >< 1$
>            mkJacobiBnd fromIntegral bnd1 3
ghci>  bndHMat4
(4><1)
[ -2.0
, -3.0
, -3.0
, -4.0 ]
> slnHMat4 :: IO (Matrix Double)
> slnHMat4 = matHMat4 >>= return . flip linearSolve bndHMat4
ghci> slnHMat4
(4><1)
[               1.25
,                1.5
, 1.4999999999999998
, 1.7499999999999998 ]

# The Jacobi Method

Inverting a matrix is expensive so instead we use the (possibly most) classical of all iterative methods, Jacobi iteration. Given $A\boldsymbol{x} = \boldsymbol{b}$ and an estimated solution $\boldsymbol{x}_i^{[k]}$, we can generate an improved estimate $\boldsymbol{x}_i^{[k+1]}$. See (Iserles 2009, chap. 12) for the details on convergence and convergence rates.

$\displaystyle \boldsymbol{x}_i^{[k+1]} = \frac{1}{A_{i,i}}\Bigg[\boldsymbol{b}_i - \sum_{j \neq i} A_{i,j}\boldsymbol{x}_j^{[k]}\Bigg]$

The simple example above does not really give a clear picture of what happens in general during the update of the estimate. Here is a larger example

Sadly, WordPress does not seem to be able to render $16 \times 16$ matrices written in LaTeX so you will have to look at the output from hmatrix in the larger example below. You can see that this matrix is sparse and has a very clear pattern.

Expanding the matrix equation for a ${\text{point}}$ not in the ${\text{boundary}}$ we get

$\displaystyle x_{i,j}^{[k+1]} = \frac{1}{4}(x^{[k]}_{i-1,j} + x^{[k]}_{i,j-1} + x^{[k]}_{i+1,j} + x^{[k]}_{i,j+1})$

Cleary the values of the points in the boundary are fixed and must remain at those values for every iteration.

Here is the method using repa. To produce an improved estimate, we define a function relaxLaplace and we pass in a repa matrix representing our original estimate $\boldsymbol{x}_i^{[k]}$ and receive the one step update $\boldsymbol{x}_i^{[k+1]}$ also as a repa matrix.

We pass in a boundary condition mask which specifies which points are boundary points; a point is a boundary point if its value is 1.0 and not if its value is 0.0.

> boundMask :: Monad m => Int -> Int -> m (Array U DIM2 Double)
> boundMask gridSizeX gridSizeY = computeP $> fromFunction (Z :. gridSizeX + 1 :. gridSizeY + 1) f > where > f (Z :. _ix :. iy) | iy == 0 = 0 > f (Z :. _ix :. iy) | iy == gridSizeY = 0 > f (Z :. ix :. _iy) | ix == 0 = 0 > f (Z :. ix :. _iy) | ix == gridSizeX = 0 > f _ = 1 Better would be to use at least a Bool as the example below shows but we wish to modify the code from the repa git repo as little as possible. > useBool :: IO (Array U DIM1 Double) > useBool = computeP$
>           R.map (fromIntegral . fromEnum) $> fromFunction (Z :. (3 :: Int)) (const True) ghci> useBool AUnboxed (Z :. 3) (fromList [1.0,1.0,1.0]) We further pass in the boundary conditions. We construct these by using a function which takes the grid size in the $x$ direction, the grid size in the $y$ direction and a given pair of co-ordinates in the grid and returns a value at this position. > boundValue :: Monad m => > Int -> > Int -> > (Int -> Int -> (Int, Int) -> Double) -> > m (Array U DIM2 Double) > boundValue gridSizeX gridSizeY bndFn = > computeP$
>   fromFunction (Z :. gridSizeX + 1 :. gridSizeY + 1) g
>   where
>     g (Z :. ix :. iy) = bndFn gridSizeX gridSizeY (ix, iy)

Note that we only update an element in the repa matrix representation of the vector if it is not on the boundary.

> relaxLaplace
>      => Array U DIM2 Double
>      -> Array U DIM2 Double
>      -> Array U DIM2 Double
>      -> m (Array U DIM2 Double)
>
>   = computeP
>     $R.zipWith (+) arrBoundValue >$ R.zipWith (*) arrBoundMask
>     $unsafeTraverse arr id elemFn > where > _ :. height :. width > = extent arr > > elemFn !get !d@(sh :. i :. j) > = if isBorder i j > then get d > else (get (sh :. (i-1) :. j) > + get (sh :. i :. (j-1)) > + get (sh :. (i+1) :. j) > + get (sh :. i :. (j+1))) / 4 > isBorder !i !j > = (i == 0) || (i >= width - 1) > || (j == 0) || (j >= height - 1) We can use this to iterate as many times as we like. > solveLaplace > :: Monad m > => Int > -> Array U DIM2 Double > -> Array U DIM2 Double > -> Array U DIM2 Double > -> m (Array U DIM2 Double) > > solveLaplace steps arrBoundMask arrBoundValue arrInit > = go steps arrInit > where > go !i !arr > | i == 0 > = return arr > > | otherwise > = do arr' <- relaxLaplace arrBoundMask arrBoundValue arr > go (i - 1) arr' For our small example, we set the initial array to $0$ at every point. Note that the function which updates the grid, relaxLaplace will immediately over-write the points on the boundary with values given by the boundary condition. > mkInitArrM :: Monad m => Int -> m (Array U DIM2 Double) > mkInitArrM n = computeP$ fromFunction (Z :. (n + 1) :. (n + 1)) (const 0.0)

We can now test the Jacobi method

> testJacobi4 :: Int -> IO (Array U DIM2 Double)
> testJacobi4 nIter = do
>   val     <- boundValue simpleEgN simpleEgN bndFnEg1
>   initArr <- mkInitArrM simpleEgN
>   solveLaplace nIter mask val initArr

After 55 iterations, we obtain convergence up to the limit of accuracy of double precision floating point numbers. Note this only provides a solution of the matrix equation which is an approximation to Laplace’s equation. To obtain a more accurate result for the latter we need to use a smaller grid size.

ghci> testJacobi4 55 >>= return . pPrint
[0.0, 1.0, 1.0, 0.0]
[1.0, 1.25, 1.5, 2.0]
[1.0, 1.5, 1.75, 2.0]
[0.0, 2.0, 2.0, 0.0]

## A Larger Example

Armed with Jacobi, let us now solve a large example.

> largerEgN, largerEgN2 :: Int
> largerEgN = 6 - 1
> largerEgN2 = (largerEgN - 1) * (largerEgN - 1)

First let us use hmatrix.

> matHMat5 :: IO (Matrix Double)
> matHMat5 = do
>   matRepa <- computeP $mkJacobiMat largerEgN :: IO (Array U DIM2 Double) > return$ largerEgN2 >< largerEgN2 $toList matRepa ghci> matHMat5 (16><16) [ -4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 1.0, -4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0, 1.0, -4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0, 0.0, 1.0, -4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 1.0, 0.0, 0.0, 0.0, -4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0, 1.0, 0.0, 0.0, 1.0, -4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, -4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0 , 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, -4.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0 , 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, -4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0 , 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, -4.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0 , 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, -4.0, 1.0, 0.0, 0.0, 1.0, 0.0 , 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, -4.0, 0.0, 0.0, 0.0, 1.0 , 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, -4.0, 1.0, 0.0, 0.0 , 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, -4.0, 1.0, 0.0 , 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, -4.0, 1.0 , 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, -4.0 ] > bndHMat5 :: Matrix Double > bndHMat5 = largerEgN2>< 1$ mkJacobiBnd fromIntegral bnd1 5
ghci>  bndHMat5
(16><1)
[ -2.0
, -1.0
, -1.0
, -3.0
, -1.0
,  0.0
,  0.0
, -2.0
, -1.0
,  0.0
,  0.0
, -2.0
, -3.0
, -2.0
, -2.0
, -4.0 ]
> slnHMat5 :: IO (Matrix Double)
> slnHMat5 = matHMat5 >>= return . flip linearSolve bndHMat5
ghci> slnHMat5
(16><1)
[ 1.0909090909090908
, 1.1818181818181817
, 1.2954545454545454
,                1.5
, 1.1818181818181817
, 1.3409090909090906
, 1.4999999999999996
, 1.7045454545454544
, 1.2954545454545459
,                1.5
, 1.6590909090909092
,  1.818181818181818
, 1.5000000000000004
, 1.7045454545454548
, 1.8181818181818186
, 1.9090909090909092 ]

And for comparison, let us use the Jacobi method.

> testJacobi6 :: Int -> IO (Array U DIM2 Double)
> testJacobi6 nIter = do
>   val     <- boundValue largerEgN largerEgN bndFnEg1
>   initArr <- mkInitArrM largerEgN
>   solveLaplace nIter mask val initArr
ghci> testJacobi6 178 >>= return . pPrint
[0.0, 1.0, 1.0, 1.0, 1.0, 0.0]
[1.0, 1.0909090909090908, 1.1818181818181817, 1.2954545454545454, 1.5, 2.0]
[1.0, 1.1818181818181817, 1.3409090909090908, 1.5, 1.7045454545454546, 2.0]
[1.0, 1.2954545454545454, 1.5, 1.6590909090909092, 1.8181818181818183, 2.0]
[1.0, 1.5, 1.7045454545454546, 1.8181818181818181, 1.9090909090909092, 2.0]
[0.0, 2.0, 2.0, 2.0, 2.0, 0.0]

Note that with a larger grid we need more points (178) before the Jacobi method converges.

# Stencils

Since we are functional programmers, our natural inclination is to see if we can find an abstraction for (at least some) numerical methods. We notice that we are updating each grid element (except the boundary elements) by taking the North, East, South and West surrounding squares and calculating a linear combination of these.

Repa provides this abstraction and we can describe the update calculation as a stencil. (Lippmeier and Keller 2011) gives full details of stencils in repa.

> fivePoint :: Stencil DIM2 Double
> fivePoint = [stencil2|  0 1 0
>                         1 0 1
>                         0 1 0 |]

Using stencils allows us to modify our numerical method with a very simple change. For example, suppose we wish to use the nine point method (which is $\mathcal{O}((\Delta x)^4)$!) then we only need write down the stencil for it which is additionally a linear combination of North West, North East, South East and South West.

> ninePoint :: Stencil DIM2 Double
> ninePoint = [stencil2| 1 4 1
>                        4 0 4
>                        1 4 1 |]

We modify our solver above to take a stencil and also an Int which is used to normalise the factors in the stencil. For example, in the five point method this is 4.

>                        => Int
>                        -> Stencil DIM2 Double
>                        -> Int
>                        -> Array U DIM2 Double
>                        -> Array U DIM2 Double
>                        -> Array U DIM2 Double
>                        -> m (Array U DIM2 Double)
> solveLaplaceStencil !steps !st !nF !arrBoundMask !arrBoundValue !arrInit
>  = go steps arrInit
>  where
>    go 0 !arr = return arr
>    go n !arr
>      = do arr' <- relaxLaplace arr
>           go (n - 1) arr'
>
>    relaxLaplace arr
>      = computeP
>      $R.szipWith (+) arrBoundValue >$ R.szipWith (*) arrBoundMask
>      $R.smap (/ (fromIntegral nF)) >$ mapStencil2 (BoundConst 0)
>      st arr

We can then test both methods.

> testStencil5 :: Int -> Int -> IO (Array U DIM2 Double)
> testStencil5 gridSize nIter = do
>   val     <- boundValue gridSize gridSize bndFnEg1
>   initArr <- mkInitArrM gridSize
>   solveLaplaceStencil nIter fivePoint 4 mask val initArr
ghci> testStencil5 5 178 >>= return . pPrint
[0.0, 1.0, 1.0, 1.0, 1.0, 0.0]
[1.0, 1.0909090909090908, 1.1818181818181817, 1.2954545454545454, 1.5, 2.0]
[1.0, 1.1818181818181817, 1.3409090909090908, 1.5, 1.7045454545454546, 2.0]
[1.0, 1.2954545454545454, 1.5, 1.6590909090909092, 1.8181818181818183, 2.0]
[1.0, 1.5, 1.7045454545454546, 1.8181818181818181, 1.9090909090909092, 2.0]
[0.0, 2.0, 2.0, 2.0, 2.0, 0.0]
> testStencil9 :: Int -> Int -> IO (Array U DIM2 Double)
> testStencil9 gridSize nIter = do
>   val     <- boundValue gridSize gridSize bndFnEg1
>   initArr <- mkInitArrM gridSize
>   solveLaplaceStencil nIter ninePoint 20 mask val initArr
ghci> testStencil9 5 178 >>= return . pPrint
[0.0, 1.0, 1.0, 1.0, 1.0, 0.0]
[1.0, 1.0222650172207302, 1.1436086139049304, 1.2495750646811328, 1.4069077172153264, 2.0]
[1.0, 1.1436086139049304, 1.2964314331751594, 1.4554776038855908, 1.6710941204241017, 2.0]
[1.0, 1.2495750646811328, 1.455477603885591, 1.614523774596022, 1.777060571200304, 2.0]
[1.0, 1.4069077172153264, 1.671094120424102, 1.777060571200304, 1.7915504172099226, 2.0]
[0.0, 2.0, 2.0, 2.0, 2.0, 0.0]

We note that the methods give different answers. Before explaining this, let us examine one more example where the exact solution is known.

We take the example from (Iserles 2009, chap. 8) where the boundary conditions are:

\displaystyle \begin{aligned} \phi(x, 0) &= 0 \\ \phi(x, 1) &= \frac{1}{(1 + x)^2 + 1} \\ \phi(0, y) &= \frac{y}{1 + y^2} \\ \phi(1, y) &= \frac{y}{4 + y^2} \end{aligned}

This has the exact solution

$\displaystyle u(x, y) = \frac{y}{(1 + x)^2 + y^2}$

And we can calculate the values of this function on a grid.

> analyticValue :: Monad m => Int -> m (Array U DIM2 Double)
> analyticValue gridSize = computeP \$ fromFunction (Z :. gridSize + 1 :. gridSize + 1) f
>   where
>     f (Z :. ix :. iy) = y / ((1 + x)^2 + y^2)
>       where
>         y = fromIntegral iy / fromIntegral gridSize
>         x = fromIntegral ix / fromIntegral gridSize

Let us also solve it using the Jacobi method with a five point stencil and a nine point stencil. Here is the encoding of the boundary values.

> bndFnEg3 :: Int -> Int -> (Int, Int) -> Double
> bndFnEg3 _ m (0, j) |           j >= 0 && j <  m = y / (1 + y^2)
>   where y = (fromIntegral j) / (fromIntegral m)
> bndFnEg3 n m (i, j) | i == n && j >  0 && j <= m = y / (4 + y^2)
>   where y = fromIntegral j / fromIntegral m
> bndFnEg3 n _ (i, 0) |           i >  0 && i <= n = 0.0
> bndFnEg3 n m (i, j) | j == m && i >= 0 && i <  n = 1 / ((1 + x)^2 + 1)
>   where x = fromIntegral i / fromIntegral n
> bndFnEg3 _ _ _                                   = 0.0

We create a function to run a solver.

> runSolver ::
>   Int ->
>   Int ->
>   (Int -> Int -> (Int, Int) -> Double) ->
>   (Int ->
>    Array U DIM2 Double ->
>    Array U DIM2 Double ->
>    Array U DIM2 Double ->
>    m (Array U DIM2 Double)) ->
>   m (Array U DIM2 Double)
> runSolver nGrid nIter boundaryFn solver = do
>   val     <- boundValue nGrid nGrid boundaryFn
>   initArr <- mkInitArrM nGrid
>   solver nIter mask val initArr

And put the five point and nine point solvers in the appropriate form.

> s5, s9 :: Monad m =>
>           Int ->
>           Array U DIM2 Double ->
>           Array U DIM2 Double ->
>           Array U DIM2 Double ->
>           m (Array U DIM2 Double)
> s5 n = solveLaplaceStencil n fivePoint 4
> s9 n = solveLaplaceStencil n ninePoint 20

And now we can see that the errors between the analytic solution and the five point method with a grid size of 8 are $\cal{O}(10^{-4})$.

ghci> liftA2 (-^) (analyticValue 7) (runSolver 7 200 bndFnEg3 s5) >>= return . pPrint
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[0.0, -3.659746856576884e-4, -5.792613003869074e-4, -5.919333582729558e-4, -4.617020226472812e-4, -2.7983716661839075e-4, -1.1394184484148084e-4, 0.0]
[0.0, -4.0566163490589335e-4, -6.681826442424543e-4, -7.270498771604073e-4, -6.163531890425178e-4, -4.157604876017795e-4, -1.9717865146007263e-4, 0.0]
[0.0, -3.4678314565880775e-4, -5.873627029994999e-4, -6.676042377350699e-4, -5.987527967581119e-4, -4.318102416048242e-4, -2.2116263241278578e-4, 0.0]
[0.0, -2.635436147627873e-4, -4.55055831294085e-4, -5.329636937312088e-4, -4.965786933938399e-4, -3.7401874422060555e-4, -2.0043638973538114e-4, 0.0]
[0.0, -1.7773949138776696e-4, -3.1086347862371855e-4, -3.714478154303591e-4, -3.5502855035249303e-4, -2.7528200465845587e-4, -1.5207424182367424e-4, 0.0]
[0.0, -9.188482657347674e-5, -1.6196970595228066e-4, -1.9595925291693295e-4, -1.903987061394885e-4, -1.5064155667735002e-4, -8.533752030373543e-5, 0.0]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

But using the nine point method significantly improves this.

ghci> liftA2 (-^) (analyticValue 7) (runSolver 7 200 bndFnEg3 s9) >>= return . pPrint
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[0.0, -2.7700522166329566e-7, -2.536751151638317e-7, -5.5431452705700934e-8, 7.393573120406671e-8, 8.403487600228132e-8, 4.188249685954659e-8, 0.0]
[0.0, -2.0141002235463112e-7, -2.214645128950643e-7, -9.753369634157849e-8, 2.1887763435035623e-8, 6.305346988977334e-8, 4.3482495659663556e-8, 0.0]
[0.0, -1.207601019737048e-7, -1.502713803391842e-7, -9.16850228516175e-8, -1.4654435886995998e-8, 2.732932558036083e-8, 2.6830928867571657e-8, 0.0]
[0.0, -6.883445567013036e-8, -9.337114890983766e-8, -6.911451747027009e-8, -2.6104150896433254e-8, 4.667329939200826e-9, 1.1717137371469732e-8, 0.0]
[0.0, -3.737430460254432e-8, -5.374955715231611e-8, -4.483740087546373e-8, -2.299792309368165e-8, -4.122571728437663e-9, 3.330287268177301e-9, 0.0]
[0.0, -1.6802381437586167e-8, -2.5009212159532446e-8, -2.229028683853329e-8, -1.3101905282919546e-8, -4.1197137368165215e-9, 3.909041701444238e-10, 0.0]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

# Bibliography

Iserles, A. 2009. A First Course in the Numerical Analysis of Differential Equations. A First Course in the Numerical Analysis of Differential Equations. Cambridge University Press. http://books.google.co.uk/books?id=M0tkw4oUucoC.

Lippmeier, Ben, and Gabriele Keller. 2011. “Efficient Parallel Stencil Convolution in Haskell.” In Proceedings of the 4th ACM Symposium on Haskell, 59–70. Haskell ’11. New York, NY, USA: ACM. doi:10.1145/2034675.2034684. http://doi.acm.org/10.1145/2034675.2034684.