Cabal Hacking at OdHac

May 5, 2013

This is a very informal blog of the Cabal hacking team at OdHac: Sasha Manzyuk, Bram Schuur and me (Dominic Steinitz). I really enjoyed pair programming with both of them and I certainly wouldn’t have tracked down the bug I investigated without their help.

Also a big thank you to Roman Cheplyaka for organising a great event.

Cabal Bug Fixing I

Ticket 823

The problem is indeed in the function Distribution.Simple.Utils.wrapText, which is used to wrap messages at 79 characters. No attempt is made to detect whether a space at which a line is wrapped is part of a file name. One can reproduce the problem more easily as follows:

$ cabal install http://hackage.haskell.org/packages/archive/groups/0.2.0.1/groups-0.2.0.1.tar.gz --prefix "/home/manzyuk/foo/bar/baz quux"
Downloading

http://hackage.haskell.org/packages/archive/groups/0.2.0.1/groups-0.2.0.1.tar.gz

Resolving dependencies...
In order, the following will be installed:
groups-0.2.0.1 (reinstall)
Warning: Note that reinstalls are always dangerous. Continuing anyway...
Configuring groups-0.2.0.1...
Building groups-0.2.0.1...
Preprocessing library groups-0.2.0.1...
[1 of 1] Compiling Data.Group ( src/Data/Group.hs, dist/build/Data/Group.o )
In-place registering groups-0.2.0.1...
Installing library in /home/manzyuk/foo/bar/baz
quux/lib/groups-0.2.0.1/ghc-7.4.1
Registering groups-0.2.0.1...
Installed groups-0.2.0.1

Note that the line starting with “Installing library in” is wrapped at the space that is part of a directory name.

I suppose, a proper way to fix it would be to use an algebraic data type rather than String to represent messages, something like Text.PrettyPrint.Doc, so that file names can be treated as unbreakable entities that shouldn’t be wrapped. This, however, seems to require quite some re-engineering of error handling in Cabal. Should I go ahead and do it?

2013-05-04 Sat: Mikhail Glushenkov commented: “I think it’s a minor issue and your efforts are better spent elsewhere.”

Ticket 760

Lines beginning with ‘–’ are comments. To enable library profiling one has to uncomment the line

-- library-profiling: False

and change the value of the option to True:

library-profiling: True

This is somewhat confusing: an option can be in three different states: True, False, and commented out.

Ticket 774

We can augment the tokenizer so that lines beginning with ‘>’ are left intact:

tokeniseLine :: (LineNo, Indent, HasTabs, String) -> [Token]
tokeniseLine (n0, i, t, l) = case split n0 l of
                            (Span _ l':ss) -> Line n0 i t l' :ss
                            cs              -> cs
  where split _ "" = []
        split n s@('>' : _) = [Line n i t s]
        split n s  = case span (\c -> c /='}' && c /= '{') s of
          ("", '{' : s') ->             OpenBracket  n : split n s'
          (w , '{' : s') -> mkspan n w (OpenBracket  n : split n s')
          ("", '}' : s') ->             CloseBracket n : split n s'
          (w , '}' : s') -> mkspan n w (CloseBracket n : split n s')
          (w ,        _) -> mkspan n w []

        mkspan n s ss | null s'   =             ss
                      | otherwise = Span n s' : ss
          where s' = trimTrailing (trimLeading s)

Is it a good / right solution? Am I overlooking anything?

2013-05-04 Sat: Roman has pointed out that Duncan Coutts announced he was rewriting the parser of .cabal files, so it probably doesn’t makes sense to fix the old parser if it’s going to be replaced by the new one.

Ticket 757

Can’t reproduce with Firefox 20.0 (Linux/x86$_{\mathrm{64}}$).

Ticket 735

I tried to reproduce this issue and I can’t. I tried to build NaturalSort-0.2.1 with GHC 7.4.1 and Cabal 1.16.03, and I would expect the build to fail due to this line in NaturalSort.cabal: Cabal-Version: >= 1.6 && < 1.9, but it does build, and the log file does contain some log info. No summary file has been produced though.

I also tried to build cal3d, which on my machine fails due to a missing C library, but again the log file contains the error message. The summary file is generated when I try to install the library from Hackage but is not generated when I manually download the source and try to build the library locally, which confirms the behavior observed in Issue #1189.

Ticket 676

The culprit of the problem was the function simplifyCondTree, which traversed CondTree in post-order, putting global options after the options defined in the branches of conditionals. I’ve changed that to pre-order traversal.

Sent a pull request. Also, added a test demonstrating what’s changed.

Ticket 885

Duplicate of Issue 774. 2013-05-04 Sat: Closed by Johan Tibell.

Ticket 883

I don’t think this is a cabal bug.

I tried to reproduce this behaviour: I downloaded monad-par-0.3.4.1 and mangled the file Control/Monad/Par/Scheds/Direct.hs so that its top looked as follows:

{-# LANGUAGE RankNTypes, NamedFieldPuns, BangPatterns,
             ExistentialQuantification, CPP, ScopedTypeVariables,
             TypeSynonymInstances, MultiParamTypeClasses,
             GeneralizedNewtypeDeriving, PackageImports,
             ParallelListComp
             #-} # OPTIONS_GHC
{- -Wall -fno-warn-name-shadowing -fno-warn-unused-do-bind #-}

Running GHC directly on this file gives the following error message:

Control/Monad/Par/Scheds/Direct.hs:6:18: parse error on input `#'

Building through cabal build produces the following error message:

Control/Monad/Par/Scheds/Direct.hs:1:1:
    File name does not match module name:
    Saw: `Main'
    Expected: `Control.Monad.Par.Scheds.Direct'

However, running GHC on Control/Monad/Par.hs produces an identical error message.

I’m guessing that something like this is happening:

  • when GHC is run directly on Control/Monad/Par/Scheds/Direct.hs, it tries to parse a module declaration, encounters # OPTIONS_GHC, decides that the file doesn’t have a module header (in particular, defaults the module name to Main), tries to parse # OPTIONS_GHC and fails;
  • when GHC is run on Control/Monad/Par.hs, it expects to find a module named Control.Monad.Par.Scheds.Direct in the file Control/Monad/Par/Scheds/Direct.hs, but when it tries to parse it, it encounters # OPTIONS_GHC, and decides that the file doesn’t have a module header, defaults the module name to Main, and fails with a different error.

This theory is supported by the following experiment: replace the pragmas and # OPTIONS_GHC above with some code, e.g., foo = 3. Then running GHC on Control/Monad/Par/Scheds/Direct.hs produces

Control/Monad/Par/Scheds/Direct.hs:12:1:
    parse error on input `module'

and running GHC on Control/Monad/Par.hs produces

Control/Monad/Par/Scheds/Direct.hs:1:1:
    File name does not match module name:
    Saw: `Main'
    Expected: `Control.Monad.Par.Scheds.Direct'

The summary of the above is that this is not a cabal bug (and probably not a bug at all), and this ticket should be closed.

2013-05-04 Sat: Closed by Johan Tibell.

Ticket 417

Is this still an issue? I see a comment in Cabal/Distribution/PackageDescription/Parse.hs:

-- "build-depends" is a local field now.  To be backwards
-- compatible, we still allow it as a global field in old-style
-- package description files and translate it to a local field by
-- adding it to every non-empty section

which makes me think that what Duncan is suggesting:

or it should automatically propagate such global options into each buildable component.

has already been implemented.

2013-05-05 Sun: It seems I was wrong, investigating.

Cabal Bug Fixing II

  • Work on adding a –lowest-dependencies flag for cabal which builds a package against its lowest dependencies. This is useful for package maintainers to see whether the lower bounds of a package are still valid. An initial version is issued as a pull-request but the actual workings are still being tweaked.
  • Small bugfix making the –package-db flag accept relative paths

Cabal Bug Fixing III

I took a look at Ticket 1297.

My first step was to grab the latest sources and run the tests.

bash-3.2$ cabal test
Running 2 test suites...
Test suite package-tests: RUNNING...
Cabal test suite - testing cabal version 1.17.0

BuildTestSuiteDetailedV09:
  : [Failed]
build failed!
expected: False
 but got: True

BuildDeps/InternalLibrary2:
  : [Failed]
executable should have linked with the internal library
expected: "myLibFunc internal"
 but got: ""
BuildDeps/InternalLibrary3:
  : [Failed]
executable should have linked with the internal library
expected: "myLibFunc internal"
 but got: ""
BuildDeps/InternalLibrary4:
  : [Failed]
executable should have linked with the installed library
expected: "myLibFunc installed"
 but got: ""
PackageTests/CMain:
  : [OK]

         Test Cases   Total
 Passed  20           20
 Failed  4            4
 Total   24           24
Test suite package-tests: FAIL
Test suite logged to: dist/test/Cabal-1.17.0-package-tests.log
Test suite unit-tests: RUNNING...
Test suite unit-tests: PASS
Test suite logged to: dist/test/Cabal-1.17.0-unit-tests.log
1 of 2 test suites (1 of 2 test cases) passed.

BuildTestSuiteDetailedV09 is a test which tests that Cabal testing is working correctly. It takes a small Cabal project with a test-suite stanza builds it and runs the tests in the test suite.

This test passes on linux so it looked like the problem was in the way cabal was running the tests. This turned out to be a red herring. I pulled the test out of the Cabal test suite and ran it on its own (obvious in hindsight). So the problem was in Cabal not in the way Cabal was being tested.

bash-3.2$ ../cabal/cabal-install/dist/build/cabal/cabal configure --enable-tests --package-db=../cabal/Cabal/dist/package.conf.inplace
Warning: The package list for 'hackage.haskell.org' is 38 days old.
Run 'cabal update' to get the latest list of available packages.
Warning: my.cabal: A package using section syntax must specify at least
'cabal-version: >= 1.2'.
Resolving dependencies...
Warning: my.cabal: A package using section syntax must specify at least
'cabal-version: >= 1.2'.
Configuring BuildTestSuiteDetailedV09-0.1...
bash-3.2$ ../cabal/cabal-install/dist/build/cabal/cabal build
Warning: my.cabal: A package using section syntax must specify at least
'cabal-version: >= 1.2'.
Building BuildTestSuiteDetailedV09-0.1...
Preprocessing library BuildTestSuiteDetailedV09-0.1...
[1 of 1] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering BuildTestSuiteDetailedV09-0.1...
Preprocessing test suite 'dummy' for BuildTestSuiteDetailedV09-0.1...
[1 of 1] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
ar: dist/build/Dummy.o: No such file or directory

But why was Cabal complaining about a missing Dummy.o when it had clearly just created it?

Sasha suggested using strace (http://en.wikipedia.org/wiki/Strace) but this didn’t exist on my Mac. But I did have dtrace (http://hints.macworld.com/article.php?story=20071031121823710 and http://chihungchan.blogspot.com/2009/05/which-process-deleted-my-file.html).

Dominics-MacBook-Pro:Dummy dom$ dtrace -qn 'syscall::unlink:entry {printf("PID=%d, CMD=%s, FILE=%s\n", pid, curpsinfo->pr_psargs, copyinstr(arg0));}'
dtrace: failed to initialize dtrace: DTrace requires additional privileges
Dominics-MacBook-Pro:Dummy dom$ sudo dtrace -qn 'syscall::unlink:entry {printf("PID=%d, CMD=%s, FILE=%s\n", pid, curpsinfo->pr_psargs, copyinstr(arg0));}'
Password:
PID=16137, CMD=Emacs, FILE=/Users/dom/Dropbox/Private/.#Everything.org
PID=16137, CMD=Emacs, FILE=/Users/dom/Dropbox/Private/#Everything.org#
PID=82566, CMD=ar, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T//ar.ugfvvt
PID=82567, CMD=libtool, FILE=dist/build/libHSBuildTestSuiteDetailedV09-0.1.a
PID=82557, CMD=cabal, FILE=dist/build/libHSBuildTestSuiteDetailedV09-0.1.a
PID=82557, CMD=cabal, FILE=dist/build/libHSBuildTestSuiteDetailedV09-0.1_p.a
PID=82557, CMD=cabal, FILE=dist/build/libHSBuildTestSuiteDetailedV09-0.1-ghc7.6.2.dylib
PID=82557, CMD=cabal, FILE=dist/build/HSBuildTestSuiteDetailedV09-0.1.o
PID=82560, CMD=ghc, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T/ghc82560_0/ghc82560_0.c
PID=82560, CMD=ghc, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T/ghc82560_0/ghc82560_0.s
PID=82573, CMD=i686-apple-darwi, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T//ccnsroxB.s
PID=82562, CMD=i686-apple-darwi, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T//ccjXqsIc.s
PID=82571, CMD=ghc, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T/ghc82571_0/ghc82571_0.c
PID=82571, CMD=ghc, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T/ghc82571_0/ghc82571_0.s
PID=82576, CMD=as, FILE=dist/build/Dummy.o
PID=82577, CMD=ar, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T//ar.mlwmuT
PID=82557, CMD=cabal, FILE=dist/build/libdummy.a
PID=82557, CMD=cabal, FILE=dist/build/libdummy_p.a
PID=82557, CMD=cabal, FILE=dist/build/libdummy-ghc7.6.2.dylib
PID=82557, CMD=cabal, FILE=dist/build/dummy.o

I noticed I had a Dummy.o and a dummy.o. Also the name of the test-suite was dummy and the name of the module in the the Cabal package was Dummy. I tried changing the name of the test-suite stanza to Dummy. No luck but when I tried changing the name of the test-suite stanza to something completely different everything worked.

bash-3.2$ ../cabal/cabal-install/dist/build/cabal/cabal build
Warning: my.cabal: A package using section syntax must specify at least
'cabal-version: >= 1.2'.
Building BuildTestSuiteDetailedV09-0.1...
Preprocessing library BuildTestSuiteDetailedV09-0.1...
[1 of 1] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering BuildTestSuiteDetailedV09-0.1...
Preprocessing test suite 'Urk' for BuildTestSuiteDetailedV09-0.1...
[1 of 1] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering Urk-0.1...
[1 of 1] Compiling Main             ( dist/build/UrkStub/UrkStub-tmp/UrkStub.hs, dist/build/UrkStub/UrkStub-tmp/Main.o )
Linking dist/build/UrkStub/UrkStub ...

Case Insensitive but Case Preserving

It turns out that most Macs are configured like this. You can check as shown below. The absence of Case-sensitive in the Name field means the file system is case insensitive.

bash-3.2$ diskutil info disk0s2
   Device Identifier:        disk0s2
   Device Node:              /dev/disk0s2
   Part of Whole:            disk0
   Device / Media Name:      Customer

   Volume Name:              Macintosh HD
   Escaped with Unicode:     Macintosh%FF%FE%20%00HD

   Mounted:                  Yes
   Mount Point:              /
   Escaped with Unicode:     /

   File System Personality:  Journaled HFS+
   Type (Bundle):            hfs
   Name (User Visible):      Mac OS Extended (Journaled)
   Journal:                  Journal size 40960 KB at offset 0x1238b000
   Owners:                   Enabled

   Partition Type:           Apple_HFS
   OS Can Be Installed:      Yes
   Media Type:               Generic
   Protocol:                 SATA
   SMART Status:             Verified
   Volume UUID:              86F67826-3F78-387D-B335-C695CADCFD86

   Total Size:               499.4 GB (499418034176 Bytes) (exactly 975425848 512-Byte-Blocks)
   Volume Free Space:        427.8 GB (427780079616 Bytes) (exactly 835507968 512-Byte-Blocks)
   Device Block Size:        512 Bytes

   Read-Only Media:          No
   Read-Only Volume:         No
   Ejectable:                No

   Whole:                    No
   Internal:                 Yes
   Solid State:              Yes

Coda

Changing dummy to Dummy tickles the bug on linux so now we have an easily reproducible bug although it is not clear how to fix this: Ticket 1311.

Logistic Regression and Automated Differentiation

April 30, 2013

Introduction

Having shown how to use automated differentiation to estimate parameters in the case of linear regression let us now turn our attention to the problem of classification. For example, we might have some data about people’s social networking such as volume of twitter interactions and number of twitter followers together with a label which represents a human judgement about which one of the two individuals is more influential. We would like to predict, for a pair of individuals, the human judgement on who is more influential.

Logistic Regression

We define the probability of getting a particular value of the binary label:

\displaystyle   \begin{aligned}  {\mathbb P}(y = 1 \mid \boldsymbol{x}; \boldsymbol{\theta}) &= h_{\boldsymbol{\theta}}(\boldsymbol{x}) \\  {\mathbb P}(y = 0 \mid \boldsymbol{x}; \boldsymbol{\theta}) &= 1 - h_{\boldsymbol{\theta}}(\boldsymbol{x})  \end{aligned}

where \boldsymbol{x^{(i)}} and \boldsymbol{\theta} are column vectors of size m

\displaystyle   h_{\boldsymbol{\theta}}(\boldsymbol{x}) = g(\boldsymbol{\theta}^T\boldsymbol{x})

and g is a function such as the logistic function g(x) = 1 / (1 + e^{-x}) or \tanh.

We can re-write this as:

\displaystyle   p(y \mid \boldsymbol{x} ; \boldsymbol{\theta}) = (h_{\boldsymbol{\theta}}(\boldsymbol{x}))^y(1 - h_{\boldsymbol{\theta}}(\boldsymbol{x}))^{1 - y}

We wish to find the value of \boldsymbol{\theta} that gives the maximum probability to the observations. We do this by maximising the likelihood. Assuming we have n observations the likelihood is:

\displaystyle   \begin{aligned}  \mathcal{L}(\boldsymbol{\theta}) &= \prod_{i=1}^n p(y^{(i)} \mid {\boldsymbol{x}}^{(i)} ; \boldsymbol{\theta}) \\            &= \prod_{i=1}^n (h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)}))^{y^{(i)}} (1 - h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)}))^{1 - y^{(i)}}  \end{aligned}

It is standard practice to maximise the log likelihood which will give the same maximum as log is monotonic.

\displaystyle   \begin{aligned}  \lambda(\boldsymbol{\theta}) &= \log \mathcal{L}(\boldsymbol{\theta}) \\            &= \sum_{i=1}^n {y^{(i)}}\log h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)}) + (1 - y^{(i)})\log (1 - h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)}))  \end{aligned}

In order to maximize the cost function, we again use the method of steepest ascent: if \boldsymbol{\theta}^i is a guess for the parameters of the model then we can improve the guess by stepping a small distance in the direction of greatest change.

\displaystyle   \boldsymbol{\theta}^{i+1} = \boldsymbol{\theta}^{i} - \gamma \nabla\mathcal{J}(\boldsymbol{\theta})

\gamma is some constant known in machine learning as the learning rate. It must be chosen to be large enough to ensure convergence within a reasonable number of steps but not so large that the algorithm fails to converge.

When the number of observations is high then the cost of evaluating the cost function can be high; as a cheaper alternative we can use stochastic gradient descent. Instead of taking the gradient with respect to all the observations, we take the gradient with respect to each observation in our data set. Of course if our data set is small we may have to use the data set several times to achieve convergence.

When the observations / training data are linearly separable then the magnitude of the parameters can grow without bound as the (parametized) logistic function then tends to the Heaviside / step function. Moreover, it is obvious that there can be more than one separaing hyperplane in this circumstance. To circumvent these infelicities, we instead maximize a penalized log likelihood function:

\displaystyle   \sum_{i=1}^n {y^{(i)}}\log h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)}) + (1 - y^{(i)})\log (1 - h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)})) - \frac{\delta}{2}\|\boldsymbol{\theta}\|^2

See Bishop and Mitchell for further details.

Implementation

Some pragmas to warn us about potentially dangerous situations.

> {-# OPTIONS_GHC -Wall                    #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults  #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind #-}
> 
> {-# LANGUAGE TupleSections #-}
> 
> module Logistic ( betas
>                 , main
>                 , a
>                 , b
>                 , nSamples
>                 ) where

Modules from the automatic differentiation library.

> import Numeric.AD
> import Numeric.AD.Types
> import qualified Data.Vector as V
> import Control.Monad
> import Control.Monad.State
> import Data.List
> import Text.Printf

Some modules from a random number generator library as we will want to generate some test data.

> import System.Random
> import Data.Random ()
> import Data.Random.Distribution.Beta
> import Data.RVar

Our model: the probability that y has the label 1 given the observations \boldsymbol{x}.

> logit :: Floating a =>
>          a -> a
> logit x = 1 / (1 + exp (negate x))

For each observation, the log likelihood:

> logLikelihood :: Floating a => V.Vector a -> a -> V.Vector a -> a
> logLikelihood theta y x = y * log (logit z) +
>                           (1 - y) * log (1 - logit z)
>   where
>     z = V.sum $ V.zipWith (*) theta x
> totalLogLikelihood :: Floating a =>
>                       V.Vector a ->
>                       V.Vector a ->
>                       V.Vector (V.Vector a) ->
>                       a
> totalLogLikelihood theta y x = (a - delta * b) / l
>   where
>     l = fromIntegral $ V.length y
>     a = V.sum $ V.zipWith (logLikelihood theta) y x
>     b = (/2) $ V.sum $ V.map (^2) theta

As before we can implement steepest descent using the grad function.

> delTotalLogLikelihood :: Floating a =>
>                 V.Vector a ->
>                 V.Vector (V.Vector a) ->
>                 V.Vector a ->
>                 V.Vector a
> delTotalLogLikelihood y x = grad f
>   where
>     f theta = totalLogLikelihood theta
>                                  (V.map auto y)
>                                  (V.map (V.map auto) x)
> 
> stepOnce :: Double ->
>             V.Vector Double ->
>             V.Vector (V.Vector Double) ->
>             V.Vector Double ->
>             V.Vector Double
> stepOnce gamma y x theta =
>   V.zipWith (+) theta (V.map (* gamma) $ del theta)
>   where
>     del = delTotalLogLikelihood y x

Or even easier just use the library function gradientAscent!

> estimates :: (Floating a, Ord a) =>
>              V.Vector a ->
>              V.Vector (V.Vector a) ->
>              V.Vector a ->
>              [V.Vector a]
> estimates y x = gradientAscent $
>                 \theta -> totalLogLikelihood theta
>                                              (V.map auto y)
>                                              (V.map (V.map auto) x)

Let’s try it out. First we need to generate some data. Rather arbitrarily let us create some populations from the beta distribution.

> betas :: Int -> Double -> Double -> [Double]
> betas n a b =
>   fst $ runState (replicateM n (sampleRVar (beta a b))) (mkStdGen seed)
>     where
>       seed = 0

We can plot the populations we wish to distinguish by sampling.

> a, b :: Double
> a          = 15
> b          = 6
> nSamples :: Int
> nSamples   = 100000
> 
> sample0, sample1 :: [Double]
> sample0 = betas nSamples a b
> sample1 = betas nSamples b a

Note that in this case we could come up with a classification rule by inspecting the histograms. Furthermore, the populations overlap which means we will inevitably mis-classify some observations.

> mixSamples :: [Double] -> [Double] -> [(Double, Double)]
> mixSamples xs ys = unfoldr g ((map (0,) xs), (map (1,) ys))
>   where
>     g ([], [])         = Nothing
>     g ([],  _)         = Nothing
>     g ( _, [])         = Nothing
>     g ((x:xs), (y:ys)) = Just $ (x, (y:ys, xs))
> createSample :: V.Vector (Double, Double)
> createSample = V.fromList $ take 100 $ mixSamples sample1 sample0

We create a model with one independent variables and thus two parameters.

> actualTheta :: V.Vector Double
> actualTheta = V.fromList [0.0, 1.0]

We initialise our algorithm with arbitrary values.

> initTheta :: V.Vector Double
> initTheta = V.replicate (V.length actualTheta) 0.1

Set the learning rate, the strength of the penalty term and the number of iterations.

> gamma :: Double
> gamma = 0.1
> 
> delta :: Floating a => a
> delta = 1.0
> 
> nIters :: Int
> nIters = 4000

Now we can run our example. For the constant parameter of our model (aka in machine learning as the bias) we ensure that the correspoding “independent variable” is always set to 1.0.

> vals :: V.Vector (Double, V.Vector Double)
> vals = V.map (\(y, x) -> (y, V.fromList [1.0, x])) $ createSample
> main :: IO ()
> main = do
>   let u = V.map fst vals
>       v = V.map snd vals
>       hs = iterate (stepOnce gamma u v) initTheta
>       xs = V.map snd vals
>       theta =  head $ drop nIters hs
>       theta' = head $ drop 100 $ estimates u v initTheta
>   printf "Hand crafted descent: theta_0 = %5.3f, theta_1 = %5.3f\n"
>          (theta V.! 0) (theta V.! 1)
>   printf "Library descent:      theta_0 = %5.3f, theta_1 = %5.3f\n"
>          (theta' V.! 0) (theta' V.! 1)
>   let predProbs  = V.map (\x -> logit $ V.sum $ V.zipWith (*) theta x) xs
>       mismatches = V.filter (> 0.5) $
>                    V.map abs $
>                    V.zipWith (-) actuals preds
>         where
>           actuals = V.map fst vals
>           preds   = V.map (\x -> fromIntegral $ fromEnum (x > 0.5))
>                           predProbs
>   let lActuals, lMisMatches :: Double
>       lActuals    = fromIntegral $ V.length vals
>       lMisMatches = fromIntegral $ V.length mismatches
>   printf "%5.2f%% correct\n" $
>          100.0 *  (lActuals - lMisMatches) / lActuals

And we get quite reasonable estimates:

ghci> main
  Hand crafted descent: theta_0 = -2.071, theta_1 = 4.318
  Library descent:      theta_0 = -2.070, theta_1 = 4.319
  97.00% correct

Regression and Automated Differentiation

April 26, 2013

Introduction

Automated differentiation was developed in the 1960’s but even now does not seem to be that widely used. Even experienced and knowledgeable practitioners often assume it is either a finite difference method or symbolic computation when it is neither.

This article gives a very simple application of it in a machine learning / statistics context.

Multivariate Linear Regression

We model a dependent variable linearly dependent on some set of independent variables in a noisy environment.

\displaystyle   y^{(i)} = \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)} + \epsilon^{(i)}

where

  • i runs from 1 to n, the number of observations;

  • \epsilon^{(i)} are i.i.d. normal with mean 0 and the same variance \sigma^2: \epsilon^{(i)} \sim \mathcal{N} (0,\sigma^2);

  • For each i, \boldsymbol{x^{(i)}} is a column vector of size m and

  • \boldsymbol{\theta} is a column vector also of size m.

In other words:

\displaystyle   p(y^{(i)} \mid \boldsymbol{x}^{(i)}; \boldsymbol{\theta}) =  \frac{1}{\sqrt{2\pi}\sigma}\exp\big(\frac{-(y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2}{2\sigma^2}\big)

We can therefore write the likelihood function given all the observations as:

\displaystyle   \mathcal{L}(\boldsymbol{\theta}; X, \boldsymbol{y}) =  \prod_{i = 1}^n \frac{1}{\sqrt{2\pi}\sigma}\exp\big(\frac{-(y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2}{2\sigma^2}\big)

In order to find the best fitting parameters \boldsymbol{\theta} we therefore need to maximize this function with respect to \boldsymbol{\theta}. The standard approach is to maximize the log likelihood which, since log is monotonic, will give the same result.

\displaystyle   \begin{aligned}  \lambda(\boldsymbol{\theta}) &= \log \mathcal{L}(\boldsymbol{\theta}) \\                               &= \sum_{i=1}^n \log \frac{1}{\sqrt{2\pi}\sigma}\exp\big(\frac{-(y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2}{2\sigma^2}\big) \\                               &= n\log \frac{1}{\sqrt{2\pi}\sigma} - \frac{1}{2\sigma^2}\sum_{i=1}^n (y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2  \end{aligned}

Hence maximizing the likelihood is the same as minimizing the (biased) estimate of the variance:

\displaystyle   \frac{1}{n}\sum_{i=1}^n (y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2

We can define a cost function:

\displaystyle   \mathcal{J}(\boldsymbol{\theta}) = \frac{1}{2n}\sum_{i=1}^n (y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2

Clearly minimizing this will give the same result. The constant 1/2 is to make the manipulation of the derivative easier. In our case, this is irrelevant as we are not going to derive the derivative explicitly but use automated differentiation.

In order to mininize the cost function, we use the method of steepest ascent (or in this case descent): if \boldsymbol{\theta}^i is a guess for the parameters of the model then we can improve the guess by stepping a small distance in the direction of greatest change.

\displaystyle   \boldsymbol{\theta}^{i+1} = \boldsymbol{\theta}^{i} - \gamma \nabla\mathcal{J}(\boldsymbol{\theta})

\gamma is some constant known in machine learning as the learning rate. It must be chosen to be large enough to ensure convergence within a reasonable number of steps but not so large that the algorithm fails to converge.

When the number of observations is high then the cost of evaluating the cost function can be high; as a cheaper alternative we can use stochastic gradient descent. Instead of taking the gradient with respect to all the observations, we take the gradient with respect to each observation in our data set. Of course if our data set is small we may have to use the data set several times to achieve convergence.

Implementation

Some pragmas to warn us about potentially dangerous situations.

> {-# OPTIONS_GHC -Wall                    #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults  #-}
> module Linear where

Modules from the automatic differentiation library.

> import Numeric.AD
> import Numeric.AD.Types
> import qualified Data.Vector as V

Some modules from a random number generator library as we will want to generate some test data.

> import Data.Random ()
> import Data.Random.Distribution.Normal
> import Data.Random.Distribution.Uniform
> import Data.RVar

Our model: the predicted value of y is \hat{y} given the observations \boldsymbol{x}.

> yhat :: Floating a =>
>         V.Vector a ->
>         V.Vector a -> a
> yhat x theta = V.sum $ V.zipWith (*) theta x

For each observation, the “cost” of the difference between the actual value of y and its predicted value.

> cost :: Floating a =>
>         V.Vector a ->
>         a ->
>         V.Vector a
>         -> a
> cost theta y x = 0.5 * (y - yhat x theta)^2

To find its gradient we merely apply the operator grad.

> delCost :: Floating a =>
>            a ->
>            V.Vector a ->
>            V.Vector a ->
>            V.Vector a
> delCost y x = grad $ \theta -> cost theta (auto y) (V.map auto x)

We can use the single observation cost function to define the total cost function.

> totalCost :: Floating a =>
>              V.Vector a ->
>              V.Vector a ->
>              V.Vector (V.Vector a)
>              -> a
> totalCost theta y x = (/l) $ V.sum $ V.zipWith (cost theta) y x
>   where
>     l = fromIntegral $ V.length y

Again taking the derivative is straightforward.

> delTotalCost :: Floating a =>
>                 V.Vector a ->
>                 V.Vector (V.Vector a) ->
>                 V.Vector a ->
>                 V.Vector a
> delTotalCost y x = grad f
>   where
>     f theta = totalCost theta (V.map auto y) (V.map (V.map auto) x)

Now we can implement steepest descent.

> stepOnce :: Double ->
>             V.Vector Double ->
>             V.Vector (V.Vector Double) ->
>             V.Vector Double ->
>             V.Vector Double
> stepOnce gamma y x theta =
>   V.zipWith (-) theta (V.map (* gamma) $ del theta)
>   where
>     del = delTotalCost y x
> stepOnceStoch :: Double ->
>                  Double ->
>                  V.Vector Double ->
>                  V.Vector Double ->
>                  V.Vector Double
> stepOnceStoch gamma y x theta =
>   V.zipWith (-) theta (V.map (* gamma) $ del theta)
>   where
>     del = delCost y x

Let’s try it out. First we need to generate some data.

> createSample :: Double -> V.Vector Double -> IO (Double, V.Vector Double)
> createSample sigma2 theta = do
>   let l = V.length theta
>   x <- V.sequence $ V.replicate (l - 1) $ sampleRVar stdUniform
>   let mu = (theta V.! 0) + yhat x (V.drop 1 theta)
>   y <- sampleRVar $ normal mu sigma2
>   return (y, x)

We create a model with two independent variables and thus three parameters.

> actualTheta :: V.Vector Double
> actualTheta = V.fromList [0.0, 0.6, 0.7]

We initialise our algorithm with arbitrary values.

> initTheta :: V.Vector Double
> initTheta = V.replicate 3 0.1

We give our model an arbitrary variance.

> sigma2 :: Double
> sigma2 = 0.01

And set the learning rate and the number of iterations.

> nSamples, nIters:: Int
> nSamples = 100
> nIters = 2000
> gamma :: Double
> gamma = 0.1

Now we can run our example. For the constant parameter of our model (aka in machine learning as the bias) we ensure that the correspoding “independent variable” is always set to 1.0.

> main :: IO ()
> main = do
>   vals <- V.sequence $
>           V.replicate nSamples $
>           createSample sigma2 actualTheta
>   let y = V.map fst vals
>       x = V.map snd vals
>       x' =  V.map (V.cons 1.0) x
>       hs = iterate (stepOnce gamma y x') initTheta
>       update theta = V.foldl f theta $ V.zip y x'
>         where
>           f theta (y, x) = stepOnceStoch gamma y x theta
>   putStrLn $ show $ take 1 $ drop nIters hs
>   let f = foldr (.) id $ replicate nSamples update
>   putStrLn $ show $ f initTheta

And we get quite reasonable estimates for the parameter.

ghci> main
  [fromList [-8.34526742975572e-4,0.6024033722648041,0.69483650585735]]
  fromList [7.095387239884274e-5,0.6017197904731632,0.694335002002961]

Comonads, Life and Klein Bottles

February 23, 2013

It’s part of Haskell folklore that the archetypal example for comonads is Conway’s game of life. Here’s an implementation using arrays.

> {-# OPTIONS_GHC -Wall                    #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults  #-}

> import Diagrams.Prelude
> import Diagrams.Backend.Cairo.CmdLine
> import Data.Array
> import Data.List

The usual comonad class:

> class Comonad c where
>   coreturn :: c a -> a
>   (=>>) :: c a -> (c a -> b) -> c b

This will become our two dimensional grid when we concretize it.

> data PointedArray i a = PointedArray i (Array i a)
>   deriving Show

As usual we make this into a functor by using the functor instance of the underlying array.

> instance Ix i => Functor (PointedArray i) where
>   fmap f (PointedArray i a) = PointedArray i (fmap f a)

An array with a distinguished element is a comonad in which the cobind updates each element of the array simultaneously.

> instance Ix i => Comonad (PointedArray i) where
>   coreturn (PointedArray i a) = a!i
>   (PointedArray i a) =>> f =
>     PointedArray i (listArray (bounds a)
>                    (map (f . flip PointedArray a) (range $ bounds a)))

Let’s have a small grid size to demonstrate the so called glider.

> mMax, nMax :: Int
> mMax = 5
> nMax = 5

A cell is either dead or alive.

> data Liveness =
>   Dead | Alive
>   deriving (Eq, Show)

Let’s be explicit about our neighbours.

> data Neighbours a = Neighbours { north     :: a
>                                , northEast :: a
>                                , east      :: a
>                                , southEast :: a
>                                , south     :: a
>                                , southWest :: a
>                                , west      :: a
>                                , northWest :: a
>                                }
>                   deriving Show
> 
> toList :: Neighbours a -> [a]
> toList (Neighbours x1 x2 x3 x4 x5 x6 x7 x8) = x1:x2:x3:x4:x5:x6:x7:x8:[]
> 
> type NumNeighbours a = Int -> Int -> PointedArray (Int, Int) a
>                        -> Neighbours a
> 
> numNeighbours :: NumNeighbours Liveness
>                  -> PointedArray (Int, Int) Liveness
>                  -> Int
> numNeighbours ns p = length $
>                      filter (== Alive) $
>                      toList $
>                      ns mMax nMax p

Now we can implement the rules.

> f :: NumNeighbours Liveness ->
>      PointedArray (Int, Int) Liveness ->
>      Liveness
> f ns p@(PointedArray (i, j) x)
>   |  x!(i, j) == Alive && (numNeighbours ns p < 2)
>   = Dead
> f ns p@(PointedArray (i, j) x)
>   |  x!(i, j) == Alive && (numNeighbours ns p `elem` [2, 3])
>   = Alive
> f ns p@(PointedArray (i, j) x)
>   |  x!(i, j) == Alive && (numNeighbours ns p > 3)
>   = Dead
> f ns p@(PointedArray (i, j) x)
>   |  x!(i, j) == Dead && (numNeighbours ns p == 3)
>   = Alive
> f _  (PointedArray (i, j) x)
>   = x!(i, j)

Let’s create a glider which will move around our manifold.

> glider :: PointedArray (Int, Int) Liveness
> glider = PointedArray (0, 0) xs
>   where
>     ys = listArray ((0, 0), (mMax - 1, nMax - 1)) $ repeat Dead
>     xs = ys // [ ((2, 4), Alive)
>                , ((3, 3), Alive)
>                , ((1, 2), Alive)
>                , ((2, 2), Alive)
>                , ((3, 2), Alive)
>                ]

We can’t have an infinite grid with an array but we can make our game of life take place on a torus rather than the plane. This way we don’t have problems with boundary conditions.

> neighbours :: Int -> Int -> PointedArray (Int, Int) a -> Neighbours a
> neighbours mMax nMax (PointedArray (i, j) x) =
>   Neighbours
>     {
>       north     = x!(i,                  (j + 1) `mod` nMax)
>     , northEast = x!((i + 1) `mod` mMax, (j + 1) `mod` nMax)
>     , east      = x!((i + 1) `mod` mMax, j)
>     , southEast = x!((i + 1) `mod` mMax, (j - 1) `mod` nMax)
>     , south     = x!(i,                  (j - 1) `mod` nMax)
>     , southWest = x!((i - 1) `mod` mMax, (j - 1) `mod` nMax)
>     , west      = x!((i - 1) `mod` mMax, j)
>     , northWest = x!((i - 1) `mod` mMax, (j + 1) `mod` nMax)
>     }

We can see that the glider reappears at the same place at iterations 21 and 41.

55ce7cfc4851d7d9d3f990a3a006c292

We don’t have to use a torus. For example, we can use a Klein bottle which is non-orientable surface.

> neighboursKlein :: Int -> Int -> PointedArray (Int, Int) a
>                    -> Neighbours a
> neighboursKlein mMax nMax (PointedArray (i, j) x) =
>   Neighbours
>     {
>       north     = north' i j
>     , northEast = northEast' i j
>     , east      = x!((i + 1) `mod` mMax, j)
>     , southEast = southEast' i j
>     , south     = south' i j
>     , southWest = southWest' i j
>     , west      = x!((i - 1) `mod` mMax, j)
>     , northWest = northWest' i j
>     }
>   where
>     north'     i j
>       | j < nMax - 1 = x!(i,                                j + 1)
>       | otherwise    = x!(mMax - 1 - i,                         0)
>     northEast' i j
>       | j < nMax - 1 = x!((i + 1) `mod` mMax,               j + 1)
>       | otherwise    = x!(mMax - 1 - (i + 1) `mod` mMax,        0)
>     southEast' i j
>       | j > 0        = x!((i + 1) `mod` mMax,               j - 1)
>       | otherwise    = x!(mMax - 1 - (i + 1) `mod` mMax, nMax - 1)
>     south'     i j
>       | j > 0        = x!(i,                                j - 1)
>       | otherwise    = x!(mMax - 1 - i,                  nMax - 1)
>     southWest' i j
>       | j > 0        = x!((i - 1) `mod` mMax,               j - 1)
>       | otherwise    = x!(mMax - 1 - (i - 1) `mod` mMax, nMax - 1)
>     northWest' i j
>       | j < nMax - 1 = x!((i - 1) `mod` mMax,               j + 1)
>       | otherwise    = x!(mMax - 1 - (i - 1) `mod` mMax,        0)

We can see that the glider reappears in the same place at iteration 21 with its left and right sides swapped and that it reappears in the same place at iteration 41 with its original handedness.

46bd694699c10c72ec955a745137e898

If you wish to run the code yourself, you will need to do something like this:

> testGrids :: [PointedArray (Int, Int) Liveness]
> testGrids = take 10 $ iterate (=>> (f neighbours)) glider

It is somewhat difficult to determine what is going on from the raw data structures themselves; it is easier to see what is happening using some form of diagram. If you wish to do this, the full code and text are available here.

Parallelising Path Dependent Options in Haskell

February 10, 2013

Path Dependency

Now we are able to price options using parallelism, let us consider a more exotic financial option.

Let us suppose that we wish to price an Asian call option. The payoff at time T is

\displaystyle  {\Bigg(\frac{\Sigma_{i=1}^n x_{t_i}}{n} -k\Bigg)}^+

Thus the payoff depends not just on the value of the underlying x at time T but also on the path taken. We do not need to know the entire path, just the average at discrete points in time. Thus the payoff is a function of time, the value of the underlying and the average of the underlying sampled at discrete points in time z(x,a,t).

We can rewrite this equation as a recursive relation:

\displaystyle  a_n = a_{n-1} + \frac{x_n -a_{n-1}}{n}

If t_n is the n-th sampling time then for small \epsilon let:

\begin{aligned}t^+ &= t_n + \epsilon \\t^- &= t_n - \epsilon\end{aligned}

Now we can write the equation for the value of the average just before and after the sampling time as:

\displaystyle  a(x, t^+) = a(x, t^-) + \frac{x - a(x, t^-)}{n}

Since there is no arbitrage, the value of the option cannot change as it crosses the sampling date:

\displaystyle  z(x, a^+, t^+) = z(x, a^-, t^-)

Substituting in:

\displaystyle  z(x, a + (x - a)/n, t^+) = z(x, a, t^-)

But we have a problem, we do not know the value of a. Away from the sampling times there is no dependence on a as its value cannot change. We already know how to step back in time in this case: we use the pricer we have already developed.

So let us assume a value for a. Then we can diffuse backwards from the final payoff but when we reach a sampling date, we reset the values on the grid using the interfacing formula above.

In summary, we assume a set of values for a and then diffuse backwards for each pricer with that a; at each sampling time, we reset the values of the pricers and continue until we reach the last sampling time. At this point, x = a so we diffuse one pricer back to the start time which gives us the price of the option for any given x on our grid.

Our usual imports and options.

> {-# LANGUAGE FlexibleContexts          #-}
> {-# LANGUAGE TypeOperators             #-}
> {-# LANGUAGE NoMonomorphismRestriction #-}
> 
> {-# OPTIONS_GHC -Wall                    #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults  #-}
> 
> import Data.Array.Repa as Repa hiding ((++), map)
> import Control.Monad
> import AsianDiagram
> import Diagrams.Prelude((===))
> import Diagrams.Backend.Cairo.CmdLine
> import Text.Printf
> import Data.List

First some constants for the payoff and the pricer. We make the space grid deliberately coarse so we can draw diagrams showing the grid before and after interfacing.

> r, sigma, k, t, xMax, aMax, deltaX, deltaT, deltaA:: Double
> m, n, p :: Int
> r = 0.05
> sigma = 0.2
> k = 50.0
> t = 3.0
> m = 10
> p = 10
> xMax = 150
> deltaX = xMax / (fromIntegral m)
> aMax = 150
> deltaA = aMax / (fromIntegral p)
> n = 100
> deltaT = t / (fromIntegral n)

We take the times at which do our Asian observations and calculate the number of steps between each observation including the initial and terminal times.

> asianTimes :: [Int]
> asianTimes = map (\x -> floor $ x*(fromIntegral n)/t) [1.5,2.0,2.5]
> 
> numSteps :: [Int]
> numSteps = snd $ mapAccumL (\s x -> (x, x - s)) 0 times
>              where times = asianTimes ++ [n]

As before we can define a single pricer that updates the grid over one time step and at multiple points in space using the Explicit Euler method. We parameterize the upper and lower boundaries z(0,t) and \lim_{x \to \infty}z(x,t) as we will want to use our pricer not just for a vanilla call option.

> type BoundaryCondition = Array D DIM1 Double -> Double
> 
> singleUpdater :: BoundaryCondition ->
>                   BoundaryCondition ->
>                   Array D DIM1 Double ->
>                   Array D DIM1 Double
> singleUpdater lb ub arr = traverse arr id f
>   where
>     Z :. m = extent arr
>     f _get (Z :. ix) | ix == 0   = lb arr
>     f _get (Z :. ix) | ix == m-1 = ub arr
>     f  get (Z :. ix)             = a * get (Z :. ix-1) +
>                                    b * get (Z :. ix) +
>                                    c * get (Z :. ix+1)
>       where
>         a = deltaT * (sigma^2 * x^2 - r * x) / 2
>         b = 1 - deltaT * (r  + sigma^2 * x^2)
>         c = deltaT * (sigma^2 * x^2 + r * x) / 2
>         x = fromIntegral ix

Again we can extend this to update many pricers on one time step and multiple points in space.

> multiUpdater :: Source r Double =>
>                 BoundaryCondition ->
>                 BoundaryCondition ->
>                 Array r DIM2 Double ->
>                 Array D DIM2 Double
> multiUpdater lb ub arr = fromFunction (extent arr) f
>      where
>        f :: DIM2 -> Double
>        f (Z :. ix :. jx) = (singleUpdater lb ub x)!(Z :. ix)
>          where
>            x :: Array D DIM1 Double
>            x = slice arr (Any :. jx)

We define the boundary condition at the maturity date of our Asian option. Notice for each pricer the boundary condition is constant, this being determined by the value of the Asian payoff.

> priceAtTAsian :: Array U DIM2 Double
> priceAtTAsian = fromListUnboxed (Z :. m+1 :. p+1)
>                 [ max 0 (deltaA * (fromIntegral l) - k)
>                 | _j <- [0..m],
>                    l <- [0..p]
>                 ]

With this we can step backwards in time for any number of timesteps.

> stepMulti :: Monad m =>
>              Int ->
>              BoundaryCondition ->
>              BoundaryCondition ->
>              Array U DIM2 Double ->
>              m (Array U DIM2 Double)
> stepMulti n lb ub = updaterM
>   where
>     updaterM :: Monad m => Array U DIM2 Double ->
>                            m (Array U DIM2 Double)
>     updaterM = foldr (>=>) return updaters
>       where
>         updaters = replicate n (computeP . multiUpdater lb ub)

Boundary Conditions

But now we are stuck. What are the boundary conditions for each of the pricers after we have done our interfacing? We follow many practioners and assume that the underlying is linear in this area. Thus in the Black-Scholes equation

\displaystyle  \frac{\partial z}{\partial t} + \frac{1}{2}\sigma^2 x^2 \frac{\partial^2 z}{\partial x^2} + r x\frac{\partial z}{\partial x} - rz = 0

we set the second derivative to 0. Thus we can use a forward method at the boundary to solve the equation below.

\displaystyle  \frac{\partial z}{\partial t} + r x\frac{\partial z}{\partial x} - rz = 0

When x=0, we have a simpler situation.

\displaystyle  \frac{\partial z(0,t)}{\partial t} - rz = 0

The corresponding difference equation for the lower boundary is:

\displaystyle  \frac{z_0^{n+1} - z_0^n}{\Delta t} -rz_0^n = 0

Which is easily represented in Haskell (rembering we are stepping backwards in time).

> lBoundaryUpdater :: BoundaryCondition
> lBoundaryUpdater arr = x - deltaT * r * x
>   where
>     x = arr!(Z :. (0 :: Int))

The corresponding difference equation for the upper boundary is:

\displaystyle  \frac{z^{n+1}_m - z^n_m}{\Delta t} +  r(m\Delta x)\frac{z^n_m - z^n_{m-1}}{\Delta x} -  rz^n_m = 0

Re-arranging

\displaystyle  \frac{z^{n+1}_m - z^n_m}{\Delta t} +  r (m-1) z^n_m -  r m z^n_{m-1} = 0

We can write this in Haskell as follows (again remembering we are stepping backwards in time).

> uBoundaryUpdater :: BoundaryCondition
> uBoundaryUpdater arr = x - deltaT * r * (a - b)
>   where
>     Z :. m = extent arr
>     x = arr!(Z :. m - 1)
>     y = arr!(Z :. m - 2)
>     a = x * fromIntegral (m - 1)
>     b = y * fromIntegral m

Interfacing

We interface by linear interpolation if value we require lies between 2 points on our grid otherwise we use linear extrapolation.

> interface :: Int -> Array U DIM2 Double ->
>              Array D DIM2 Double
> interface n grid = traverse grid id (\_ sh -> f sh)
>   where
>     (Z :. _iMax :. jMax) = extent grid
>     f (Z :. i :. j) = inter
>       where
>         x       = deltaX * (fromIntegral i)
>         aPlus   = deltaA * (fromIntegral j)
>         aMinus  = aPlus + (x - aPlus) / (fromIntegral n)
>         jLower  = if k > jMax - 1 then jMax - 1 else k
>                     where k = floor $ aMinus / deltaA
>         jUpper  = if (jLower == jMax - 1)
>                     then jMax - 1
>                     else jLower + 1
>         aLower  = deltaA * (fromIntegral jLower)
>         prptn   = (aMinus - aLower) / deltaA
>         vLower  = grid!(Z :. i :. jLower)
>         vUpper  = grid!(Z :. i :. jUpper)
>         inter   = vLower + prptn * (vUpper - vLower)

Example

Now we are in a position to give an example of pricing our option with a few helper functions

> showD :: Double -> String
> showD = printf "%.2f"
> 
> showArrD1 :: Array U DIM1 Double -> String
> showArrD1 = intercalate ", " . map showD . toList
> 
> showSlices :: String -> Array U DIM2 Double -> IO ()
> showSlices message prices = do
>   putStrLn ('\n' : message)
> 
>   let (Z :. _i :. j) = extent prices
>       slicesD = map (\m -> slice prices (Any :. m)) [0..j-1]
> 
>   slices <- mapM computeP slicesD
>   mapM_ (putStrLn . showArrD1) slices
> 
> diagonal :: Source a Double =>
>             Array a DIM2 Double ->
>             Array D DIM2 Double
> diagonal arr = traverse arr g f
>   where
>     f :: (DIM2 -> Double) -> DIM2 -> Double
>     f get (Z :. ix :. _) = get (Z :. ix :. ix)
> 
>     g :: DIM2 -> DIM2
>     g (Z :. ix :. iy) = Z :. (min ix iy) :. (1 :: Int)
> 
> main :: IO ()
> main = do putStrLn "\nAsianing times"
>           putStrLn $ show asianTimes
> 
>           let lb = lBoundaryUpdater
>               ub = uBoundaryUpdater
> 
>           showSlices "Initial pricers" priceAtTAsian

Now instead of stepping all the way back to the initial time of the option, we only step back to just before the last observation.

>           grid3b <- stepMulti (numSteps!!3) lb ub priceAtTAsian
>           showSlices ("Just before 3") grid3b

The diagram below shows our grid just before we make the last observation. The x-axis is the value of the underlying and the y-axis is the value of the Asian value. Both have been normalised to 1.0. That is the value 1.0 represents 150.0 (our maximum value on the grid for both the underlying and the Asian / auxiliary variable).

>           grid3a <- computeP $ interface 3 grid3b
>           showSlices ("Just after 3") grid3a

The diagram below shows our grid after before we make the last observation i.e., after interfacing. Again, the x-axis is the value of the underlying and the y-axis is the value of the Asian value. Both have been normalised to 1.0. That is the value 1.0 represents 150.0 (our maximum value on the grid for both the underlying and the Asian / auxiliary variable).

And then step backwards with the new final boundary condition.

>           grid2b <- stepMulti (numSteps!!2) lb ub grid3a
>           showSlices ("Just before 2") grid2b
> 
>           grid2a <- computeP $ interface 2 grid2b
>           showSlices ("Just after 2") grid2a

Again we step backwards in time with a new final boundary condition.

>           grid1b <- stepMulti (numSteps!!1) lb ub grid2a
>           showSlices ("Just before 1") grid1b

Finally we step all the way back to the time at which we wish to know the price. Here the interface condition is just x=a so we can take the diagonal of our grid and diffuse backwards using a single pricer.

>           grid1a <- computeP $ diagonal grid1b
>           showSlices ("Just after 1") grid1a
>           grid1b <- stepMulti (numSteps!!0) lb ub grid1a
>           showSlices "Final pricer" grid1b

Option Pricing Using Haskell Parallel Arrays

January 5, 2013

Can we get better performance for pricing financial options in Haskell?

First let us translate our pricer using the Explicit Euler Method to use repa.

{-# LANGUAGE FlexibleContexts, TypeOperators #-}

{-# OPTIONS_GHC -Wall -fno-warn-name-shadowing -fno-warn-type-defaults #-}

import Data.Array.Repa as Repa
import Data.Array.Repa.Eval
import Control.Monad

r, sigma, k, t, xMax, deltaX, deltaT :: Double
m, n, p :: Int
r = 0.05
sigma = 0.2
k = 50.0
t = 3.0
m = 80
p = 2
xMax = 150
deltaX = xMax / (fromIntegral m)
n = 800
deltaT = t / (fromIntegral n)

data PointedArrayU a = PointedArrayU Int (Array U DIM1 a)
  deriving Show

singleUpdaterP :: PointedArrayU Double -> Double
singleUpdaterP (PointedArrayU j _x) | j == 0 = 0.0
singleUpdaterP (PointedArrayU j _x) | j == m = xMax - k
singleUpdaterP (PointedArrayU j  x)          = a * x! (Z :. j-1) +
                                               b * x! (Z :. j) +
                                               c * x! (Z :. j+1)
  where
    a = deltaT * (sigma^2 * (fromIntegral j)^2 - r * (fromIntegral j)) / 2
    b = 1 - deltaT * (r  + sigma^2 * (fromIntegral j)^2)
    c = deltaT * (sigma^2 * (fromIntegral j)^2 + r * (fromIntegral j)) / 2

priceAtTP :: PointedArrayU Double
priceAtTP = PointedArrayU 0 (fromListUnboxed (Z :. m+1) 
                            [ max 0 (deltaX * (fromIntegral j) - k) | j <- [0..m] ])

coBindU :: (Source U a, Source U b, Target U b, Monad m) =>
           PointedArrayU a -> (PointedArrayU a -> b) -> m (PointedArrayU  b)
coBindU (PointedArrayU i a) f = computeP newArr >>= return . PointedArrayU i
  where
      newArr = traverse a id g
        where
          g _get (Z :. j) = f $ PointedArrayU j a

testN :: Int -> IO (PointedArrayU Double)
testN n =  h priceAtTP
           where
           h = foldr (>=>) return
               (take n $ Prelude.zipWith flip (repeat coBindU) (repeat singleUpdaterP))

So far so good but this has not bought us very much over using Data.Array or Data.Vector. Morevoer we have not been able to use the costate comonad because of the restrictions on types imposed by repa.

Let us re-write the above without even attempting to mimic the cobind operator. Thus we no longer need our pointed array.

singleUpdater :: Array D DIM1 Double -> Array D DIM1 Double
singleUpdater a = traverse a id f
  where
    Z :. m = extent a
    f _get (Z :. ix) | ix == 0   = 0.0
    f _get (Z :. ix) | ix == m-1 = xMax - k
    f  get (Z :. ix)             = a * get (Z :. ix-1) +
                                   b * get (Z :. ix) +
                                   c * get (Z :. ix+1)
      where
        a = deltaT * (sigma^2 * (fromIntegral ix)^2 - r * (fromIntegral ix)) / 2
        b = 1 - deltaT * (r  + sigma^2 * (fromIntegral ix)^2)
        c = deltaT * (sigma^2 * (fromIntegral ix)^2 + r * (fromIntegral ix)) / 2

priceAtT :: Array U DIM1 Double
priceAtT = fromListUnboxed (Z :. m+1) [max 0 (deltaX * (fromIntegral j) - k) | j <- [0..m]]

Now suppose we want to price a spot ladder then we would like to apply the same update function to slightly modified payoffs (the terminal boundary condition). First let’s utilise our single update function to work over two dimensional grid.

multiUpdater :: Source r Double => Array r DIM2 Double -> Array D DIM2 Double
multiUpdater a = fromFunction (extent a) f
     where
       f :: DIM2 -> Double
       f (Z :. ix :. jx) = (singleUpdater x)!(Z :. ix)
         where
           x :: Array D DIM1 Double
           x = slice a (Any :. jx)

We define our spot ladder on a call to be many calls each with a strike differing by 10 basis points (a basis point is one hundredth of one percent).

priceAtTMulti :: Array U DIM2 Double
priceAtTMulti = fromListUnboxed (Z :. m+1 :. p+1)
                [ max 0 (deltaX * (fromIntegral j) - (k + (fromIntegral l)/1000.0))
                | j <- [0..m]
                , l <- [0..p]
                ]

Now we can test our spot ladder pricer. Note that we force the evaluation of our two dimensional array at each time step. If we do not do that then we end up with a leak as presumably repa tries to fuse many updates. Moreover fusing updates across time steps does not really buy us anything.

testMulti :: IO (Array U DIM2 Double)
testMulti = updaterM priceAtTMulti
  where
    updaterM :: Monad m => Array U DIM2 Double -> m (Array U DIM2 Double)
    updaterM = foldr (>=>) return (replicate n (computeP . multiUpdater))

pickAtStrike :: Monad m => Int -> Array U DIM2 Double -> m (Array U DIM1 Double)
pickAtStrike n t = computeP $ slice t (Any :. n :. All)

main :: IO ()
main = do t <- testMulti
          vStrikes <- pickAtStrike 27 t
          putStrLn $ show vStrikes

I am planning to write a blog post on performance but for now you can get some idea of how well this performs by reading this.

Plotting a Distribution

November 4, 2012

One way of visualizing a distribution is to take lots of samples for it and plot the resulting histogram.

To do this we use the diagrams package.

{-# LANGUAGE TupleSections #-}

import Diagrams.Prelude

import Diagrams.Backend.Cairo.CmdLine
import Data.Colour (withOpacity)

To generate the actual samples (in this case from the beta distribution) we use the random-fu and rvar packages.

import Data.Random ()
import Data.Random.Distribution.Beta
import Data.RVar

We import some other standard modules and define some constants.

import System.Random

import Data.List
import qualified Data.IntMap as IntMap

import Control.Monad.State

import Text.Printf

tickSize   = 0.1
nCells     = 100
a          = 15
b          = 6
nSamples   = 100000
cellColour = blue `withOpacity` 0.5

First we define a background on which we are going to plot our histogram and also a function to draw the x-axis.

background = rect 1.1 1.1 # translate (r2 (0.5, 0.5))

ticks xs = (mconcat $ map tick xs)  <> line
  where
    maxX   = maximum xs
    line   = fromOffsets [r2 (maxX, 0)]
    tSize  = maxX / 100
    tick x = endpt # translate tickShift
      where
        tickShift = r2 (x, 0)
        endpt     = topLeftText (printf "%.2f" x) # fontSize (tSize * 2) <>
                    circle tSize # fc red  # lw 0

We sample from the beta distribution with parameters a and b and put these into a histogram.

betas :: Int -> Double -> Double -> [Double]
betas n a b =
  fst $ runState (replicateM n (sampleRVar (beta a b))) (mkStdGen seed)
    where
      seed = 0

histogram :: Int -> [Double] -> IntMap.IntMap Int
histogram nCells xs =
  foldl' g emptyHistogram xs
    where
      g h x          = IntMap.insertWith (+) (makeCell nCells x) 1 h
      emptyHistogram = IntMap.fromList $ zip [0 .. nCells - 1] (repeat 0)
      makeCell m     = floor . (* (fromIntegral m))

Finally we can render the histogram by drawing blue rectangles with a white border.

hist xs = scaleX sX . scaleY sY . position $ hist' where
    ysmax = fromInteger . ceiling $ maximum xs
    ysmin = fromInteger . floor $ minimum xs
    xsmax = fromIntegral $ length xs
    xsmin = 0.0
    sX = 1 / (xsmax - xsmin)
    sY = 1 / (ysmax - ysmin)
    hist' = zip (map p2 $ map (,0) $
            map fromInteger [0..]) (map (cell 1) xs)
    cell w h = alignB $ rect w h
                      # fcA cellColour
                      # lc white
                      # lw 0.001

And now we can draw our histogram.

test tickSize nCells a b nSamples =
  ticks [0.0, tickSize..1.0] <>
  hist xs <>
  background
    where
      xs = IntMap.elems $
           IntMap.map fromIntegral $
           histogram nCells $
           betas nSamples a b
  
main :: IO ()
main = defaultMain $ test tickSize nCells a b nSamples

The Metropolis Algorithm

August 15, 2012

A Simple Example

In Section 7.1 of Doing Bayesian Data Analysis, John Kruschke gives a simple example of the Metropolis algorithm, in which we generate samples from a distribution without knowing the distribution itself. Of course the example is contrived as we really do know the distribtion. In the particular example, {\cal P}(X = i) = i/k for i = 1…n where n is some fixed natural number and k is a normalising constant.

Here’s the algorithm in Haskell. We use the random-fu package and the rvar package for random variables and the random package to supply random numbers.

{-# LANGUAGE ScopedTypeVariables, NoMonomorphismRestriction #-}

import Data.Random
import Data.RVar
import System.Random
import Control.Monad.State
import Data.List

We pick an explicit seed and set n = 7 (in Kruschke’s example this is the number of islands that a politician visits).

seed :: Int
seed = 2
numIslands :: Int
numIslands = 7

And we pick an arbitrary number of jumps to try in the Metropolis algorithm.

n = 11112

We generate proposed jumps which are either one step up or one step down.

proposedJumps :: Int -> [Int]
proposedJumps seed =
  map f $ fst $
  runState (replicateM n $ sampleRVar $ uniform False True) (mkStdGen seed)
    where f False = negate 1
          f True  = 1

And we generate samples from the uniform distribution on [0, 1] which will allow us to determine whether to accept or reject a proposed jump.

acceptOrRejects :: Int -> [Double]
acceptOrRejects seed =
  fst $ runState (replicateM n $ sampleRVar $ uniform 0 1) (mkStdGen seed)

We pretend we only know a measure of how often we pick a given number but not the normalising constant to make this measure a probability measure.

p n | n >= 1 && n <= numIslands = n
    | otherwise                 = 0

We define a function which defines one step of the Metropolis algorithm.

f currentPosition (proposedJump, acceptOrReject) =
  if acceptOrReject < probAccept
    then
      currentPosition + proposedJump
    else
      currentPosition
  where
    probAccept = min 1 (pNew / pCur)
    pNew = fromIntegral $ currentPosition + proposedJump
    pCur = fromIntegral currentPosition

Let’s try this out with a somewhat arbitrary burn in period starting at position 3.

runMC seed = map (/ total) numVisits
  where
    total     = sum numVisits
    numVisits = map (\j -> fromIntegral $ length $ filter (==j) $ xs)
                    [1 .. numIslands]
    xs        = drop (n `div` 10) $
                scanl f 3 (zip (proposedJumps seed) (acceptOrRejects seed))

We can then compute the root mean square error for this particular sample size.

actual = map (/s) xs
           where
             xs = map fromIntegral [1 .. numIslands]
             s  = sum xs

rmsError n = sqrt $ (/(fromIntegral numIslands)) $
             sum $ map (^2) $ zipWith (-) actual (runMC n)

A Bernoulli Example

We can also code the Metropolis algorithm for the example in which samples are drawn from a Bernoulli distribution. First we can define the likelihood for drawing drawing a specific sequence of 0’s and 1’s from a binomial distribution with parrameter θ. We also define the example sample of data given in Doing Bayesian Data Analysis.


likelihood :: Int -> Int -> Double -> Double
likelihood z n theta = theta^z * (1 - theta)^(n - z)

myData = [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]

We define a function which defines one step of the Metropolis algorithm.

oneStep currPosition (propJump, acceptOrReject) =
  if acceptOrReject < probAccept
    then
      currPosition + propJump
    else
      currPosition
  where
    probAccept = min 1 (p (currPosition + propJump) / p currPosition)
    p x | x < 0 = 0
        | x > 1 = 0
        | otherwise = pAux myData x
    pAux :: [Double] -> Double -> Double
    pAux xs position = likelihood z n position
      where
        n = length xs
        z = length $ filter (== 1) xs

Finally we need some proposed jumps; following the example we generate these from a normal distribution {\cal N}(0, 0.1).

normals :: [Double]
normals =  fst $ runState (replicateM n (sampleRVar (normal 0 0.1)))
                          (mkStdGen seed)

And now we can run the Metropolis algorithm and, for example, find the mean of the posterior.

accepteds = drop (n `div` 10) $
            scanl oneStep 0.5 (zip normals (acceptOrRejects seed))

mean = total / count
         where
           (total, count) = foldl' f (0.0, 0) accepteds
           f (s, c) x = (s+x, c+1)

Functional R

We can now re-write the example in R in a more functional way.

## An arbitrary number of jumps to try in Metropolis
## algorithm.
trajLength = 11112

## The data represented as a vector.
myData = c( 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0 )

## An arbitrary seed presumably used in generating random
## values from the various distributions.
set.seed(47405)

## We need some proposed jumps; following the example we
## generate these from a normal distribution.
normals = rep (0, trajLength)
for (j in 1:trajLength) {
normals[j] = rnorm( 1, mean = 0, sd = 0.1 )
}

## We generate proposed jumps which are either one step up
## or one step down.
proposedJumps = rep (0, trajLength)
for (j in 1:trajLength) {
proposedJumps[j] =
if ( runif( 1 ) 1 | theta < 0 ] = 0
return ( x )
}

prior = function( position, xs ) {
n = length( xs )
z = sum ( xs == 1)
return ( likelihood ( z, n, position ) )
}

## We define a function which defines one step of the
## Metropolis algorithm.
oneStep = function ( currPosition, propJumpAorR ) {
proposedJump = propJumpAorR [1]
acceptOrReject = propJumpAorR [2]
probAccept = min( 1,
prior( currPosition + proposedJump , myData )
/ prior( currPosition , myData ) )
if ( acceptOrReject < probAccept ) {
trajectory = currPosition + proposedJump
} else {
trajectory = currPosition
}
return ( trajectory )
}

## Pair together the proposed jumps and the probability
## whether a given proposal will be accepted or rejected.
nsAorRs <- list ()
for (i in 1:trajLength)
nsAorRs[[i]] <- c(normals[i], acceptOrRejects[i])

## Fold (well really scanl) over the pairs returning all the
## intermediate results.
trajectory = Reduce( function(a,b) oneStep (a, b),
nsAorRs, accumulate=T,init= 0.5 )

## Drop the values for the burn in period.
burnIn = ceiling( .1 * trajLength )
acceptedTraj = trajectory[burnIn:trajLength]

## Finally return the mean of the posterior.
result = mean ( acceptedTraj )

A Triple Birthday Paradox

May 7, 2012

A few weeks ago, two of my colleagues and I had a birthday in the same week. What are the odds of that? There’s about 25 people in the department in which I work.

We consider every outcome where pairs of people share birthdays but no 3 or more people share birthdays. Then we can use the birthday paradox trick and subtract the proportion of these from 1.

To be absolutely precise we ought to consider outcomes where no one shares a birthday but this is sufficiently small in comparison in the case we are interested in that we can ignore it.

We begin with some imports and a dumb function to calculate multinomial coefficients.


import qualified Data.Map as Map
import System.Random
import Prelude hiding (replicate, zipWith, length)

binomial n 0 = 1
binomial 0 k = 0
binomial n k = n * binomial (n - 1) (k - 1) `div` k

multinomial xs =
  (factorial (sum xs)) `div` (product [ factorial x | x<-xs ])
    where
      factorial n = product [1..n]

Let’s take a smaller example. Suppose we have 9 people in a department and we have divided the year up into 11 periods.

We want the cases where there are exactly two birthdays in a period. For example, 2 people might have a birthday in period 1, 7 people might have birthdays in periods 2–8 and no-one has a birthday in periods 9–11. This would give us \binom{11}{1}\binom{10}{7} ways of choosing people. We also need to consider that the 2 people sharing the birthday might share them in period 2, 3, …, 11. There are \binom{9}{2,1,1,1,1,1,1,1,0} ways of this happening. Letting x stand for the total number of ways that 2 people might share exactly 2 birthdays in a given period, we have:


\begin{aligned} x& = \binom{11}{1}\binom{10}{7}\binom{9}{2,1,1,1,1,1,1,1,0} \\ & + \binom{11}{2}\binom{9}{5} \binom{9}{2,2,1,1,1,1,1,0,0} \\ & + \binom{11}{3}\binom{8}{3} \binom{9}{2,2,2,1,1,1,0,0,0} \\ & + \binom{11}{4}\binom{7}{1} \binom{9}{2,2,2,2,1,0,0,0,0}\end{aligned}

In Haskell, we can write this as:

nPeople :: Integer
nPeople = 9
nPeriods :: Integer
nPeriods = 11

x = binomial 11 1 * binomial 10 7 * multinomial [2,1,1,1,1,1,1,1,0] +
    binomial 11 2 * binomial 9  5 * multinomial [2,2,1,1,1,1,1,0,0] +
    binomial 11 3 * binomial 8  3 * multinomial [2,2,2,1,1,1,0,0,0] +
    binomial 11 4 * binomial 7  1 * multinomial [2,2,2,2,1,0,0,0,0]

And therefore we expect this proportion of exactly 2 shared birthdays

y = (fromInteger x) / (fromInteger nPeriods^nPeople)

Generalizing, we first need to be able to generate the desired multinomial coefficient.

coeff x' y' = take y (ns 2) ++ take (x - 2*y) (ns 1) ++ take y (ns 0)
  where x    = fromIntegral x'
        y    = fromIntegral y'
        ns n = n:(ns n)

For example:

*Main> coeff 11 1
[2,1,1,1,1,1,1,1,1,1,0]

Now we can calculate the number of ways of getting exactly 2 birthdays in any single period

terms x y = sum $ map (oneTerm x y) [1..(x `div` 2)]
  where
    oneTerm x y n = binomial y n *
                    binomial (y - n) (x - (2 * n)) *
                    multinomial (coeff x n)

And finally we can calculate the probability of getting exactly 2 birthdays in any single period.

atMostTwoShared x y = fromInteger (terms x y) / fromInteger (y^x)

Because it’s easy to double count or forget a particulare case, it’s best to check via Monte Carlo:

nSims = 10000

monteCarlo =
  runs >>= return . Map.map (/(fromInteger nSims)) . counts where
    counts rs      = foldr (\b -> Map.insertWith (+) b 1) Map.empty rs
    runs           = sequence $
                     take (fromIntegral nSims) $
                     repeat (bDays >>= return . maxOnOneDay)
    bDays          = sequence $
                     take (fromIntegral nPeople) $
                     repeat $ pickBirthday
    maxOnOneDay bs = maximum $
                     Map.elems $
                     foldr (\b -> Map.insertWith (+) b 1) Map.empty bs
    pickBirthday   = getStdRandom (randomR (1,nPeriods))

main = do
  ps <- monteCarlo
  putStrLn $ show ps
  putStrLn $ show $ atMostTwoShared nPeople nPeriods

Surprisingly the answer for 3 people sharing a birthday in a period where there are 9 people and 11 periods is 43%.

The answer we are after is for 25 people and 52 periods (weeks in the year) and this is 50%.

The Implicit Euler Method

April 22, 2012

We can solve our differential equation using the Implicit Euler method which is unconditionally stable. We can also take this opportunity to use the Vector Package rather than Arrays as it has a richer set of combinators and to tidy up the code to make the payoff explicit (thanks to suggestions by Ben Moseley).

First let’s get our imports arranged so the code isn’t too cluttered with qualified names.

{-# LANGUAGE RecordWildCards #-}
import Prelude hiding
  ( length
  , scanl
  , scanr
  , zip
  , tail
  , init
  , last
  )
import qualified Data.Vector as V
import Data.Vector
  ( (!)
  , generate
  , length
  , Vector
  , prescanl
  , scanl
  , scanr
  , zip
  , tail
  , init
  , last
  , unfoldr
  )

Define a type synomym for the payoff function and put all the parameters for the trade in one place.

type Payoff = Double -> Double

data Config = Config {
  r      :: Double,
  sigma  :: Double,
  t      :: Double,
  m      :: Int, -- Space steps
  n      :: Int, -- Time steps
  xMax   :: Double,
  deltaX :: Double,
  deltaT :: Double
  }

defaultConf :: Config
defaultConf = Config {
  r = 0.05,
  sigma = 0.20,
  t = 3.0,
  m = 80,
  n = 800,
  xMax = 150,
  deltaX = xMax defaultConf / fromIntegral (m defaultConf),
  deltaT = t defaultConf / fromIntegral (n defaultConf)
  }

The usual(!) Comonad class:

class Comonad c where
  coreturn :: c a -> a
  (=>>) :: c a -> (c a -> b) -> c b

Instead of using an Array to represent a "time slice" we use a Vector. Notice we can use the Vector library functions to implement the cobind.

data PointedArray a = PointedArray Int (Vector a)
  deriving Show

instance Functor PointedArray where
  fmap f (PointedArray i x) = PointedArray i (fmap f x)

instance Comonad PointedArray where
  coreturn (PointedArray i x) = x!i
  (PointedArray i x) =>> f =
    PointedArray i (V.map (f . flip PointedArray x) (generate (length x) id))

We start with our partial differential equation:


\frac{\partial V}{\partial t} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + r S\frac{\partial V}{\partial S} - rV = 0

Again we can approximate this by a difference equation but in this case we approximate the time derivative by \frac{z^n_i - z^{n-1}_i}{\Delta t}.


\begin{aligned} \frac{z^{n-1}_i - z^n_i}{\Delta t} + \frac{1}{2} \sigma^2 (i\Delta x)^2\frac{z^n_{i+1} - 2z^n_i + z^n_{i-1}}{\Delta x^2} + r(i\Delta x)\frac{z^n_{i+1} - z^n_{i-1}}{2\Delta x} - rz^n_i = 0\end{aligned}

Re-arranging (and not forgetting we are going backwards in time so that n is earlier than n − 1), we obtain an implicit equation for zn in terms zn − 1:


\begin{aligned} z^{n-1}_i = A_i z^n_{i-1} + B_i z^n_i + C_i z^n_{i+1}\end{aligned}

where


\begin{aligned} A_i &= \frac{\Delta t}{2}(ri - \sigma^2 i^2) \\ B_i &= 1 + \Delta t(r + \sigma^2 i^2) \\ C_i &= -\frac{\Delta t}{2}(ri + \sigma^2 i^2)\end{aligned}

Thus we have a matrix equation which we need to invert. Fortunately the matrix is tridiagonal so we can use the Thomas Algorithm to invert it.

Let’s represent our tridiagonal matrix as a Vector of triples

data TriDiMatrix a = TriDiMatrix (Vector (a, a, a))
  deriving Show

Here’s our tridiagonal matrix. Note that we have encoded two of the boundary conditions in this (the case for S(t, 0) and for S(t, x) as x → ∞).

triDi =
  TriDiMatrix $ unfoldr h 0
    where
      h :: Int -> Maybe ((Double, Double, Double), Int)
      h k | k == m' + 1 = Nothing
      h k | k == m'     = Just (( 0.0,    1.0,   0.0), k + 1)
      h k | k == 0      = Just (( 0.0,    1.0,   0.0), k + 1)
      h k               = Just ((afn k, bfn k, cfn k), k + 1)

      afn i =          deltaT' * (r' * (fromIntegral i) - sigma'^2 * (fromIntegral i)^2) / 2
      bfn i = 1      + deltaT' * (r' + sigma'^2 * (fromIntegral i)^2)
      cfn i = negate $ deltaT' * (r' * (fromIntegral i) + sigma'^2 * (fromIntegral i)^2) / 2

      m'      = m      defaultConf
      deltaT' = deltaT defaultConf
      r'      = r      defaultConf
      sigma'  = sigma  defaultConf

Next we implement the Thomas Algorithm. solveTriDi takes a (tridiagonal) matrix and a vector and returns the application of the inverse of the matrix to the vector.

solveTriDi :: TriDiMatrix Double -> Vector Double -> Vector Double
solveTriDi (TriDiMatrix m) d = z
  where

    -- Forward sweep
    c' = prescanl f (let (a1, b1, c1) = m!0 in c1 / b1) (tail m)
           where f ci' (ai, bi, ci) = ci / (bi - ci'*ai)

    d' = scanl g d1' $ zip (tail m) (zip c' (tail d))
         where g di' ((ai, bi, ci), (ci', di)) = (di - di'*ai) / (bi - ci'*ai)
               d1         = d!0
               (_, b1, _) = m!0
               d1'        = d1 / b1

    -- Backward substitution
    z = scanr h xn $ zip c' (init d')
          where
            h (ci', di') xi1 = di' - ci'*xi1
            xn = last d'

Now set up the boundary condition for our equation which is now a function of the payoff.

pricePArrAtT :: Config -> Payoff -> Int -> PointedArray Double
pricePArrAtT cfg@(Config {..}) fn i =
  PointedArray i (V.fromList [ fn (deltaX * (fromIntegral j)) | j <- [0..m] ])

Finally we can price our call option without having to worry about the stability of the solution.

callPrices :: Int -> [Double]
callPrices i = map coreturn $ iterate (=>> oneStep) $ pricePArrAtT defaultConf fn i
  where
    oneStep (PointedArray j x) = (solveTriDi triDi x)!j
    fn x                       = max 0 (x - k) -- A call
    k                          = 50.0          -- The strike

callPrice i = (callPrices i)!!(n defaultConf - 1)

Follow

Get every new post delivered to your Inbox.

Join 46 other followers