## Cabal Hacking at OdHac

May 5, 2013

This is a very informal blog of the Cabal hacking team at OdHac: Sasha Manzyuk, Bram Schuur and me (Dominic Steinitz). I really enjoyed pair programming with both of them and I certainly wouldn’t have tracked down the bug I investigated without their help.

Also a big thank you to Roman Cheplyaka for organising a great event.

## Cabal Bug Fixing I

Ticket 823

The problem is indeed in the function Distribution.Simple.Utils.wrapText, which is used to wrap messages at 79 characters. No attempt is made to detect whether a space at which a line is wrapped is part of a file name. One can reproduce the problem more easily as follows:

$cabal install http://hackage.haskell.org/packages/archive/groups/0.2.0.1/groups-0.2.0.1.tar.gz --prefix "/home/manzyuk/foo/bar/baz quux" Downloading http://hackage.haskell.org/packages/archive/groups/0.2.0.1/groups-0.2.0.1.tar.gz Resolving dependencies... In order, the following will be installed: groups-0.2.0.1 (reinstall) Warning: Note that reinstalls are always dangerous. Continuing anyway... Configuring groups-0.2.0.1... Building groups-0.2.0.1... Preprocessing library groups-0.2.0.1... [1 of 1] Compiling Data.Group ( src/Data/Group.hs, dist/build/Data/Group.o ) In-place registering groups-0.2.0.1... Installing library in /home/manzyuk/foo/bar/baz quux/lib/groups-0.2.0.1/ghc-7.4.1 Registering groups-0.2.0.1... Installed groups-0.2.0.1 Note that the line starting with “Installing library in” is wrapped at the space that is part of a directory name. I suppose, a proper way to fix it would be to use an algebraic data type rather than String to represent messages, something like Text.PrettyPrint.Doc, so that file names can be treated as unbreakable entities that shouldn’t be wrapped. This, however, seems to require quite some re-engineering of error handling in Cabal. Should I go ahead and do it? 2013-05-04 Sat: Mikhail Glushenkov commented: “I think it’s a minor issue and your efforts are better spent elsewhere.” ### Ticket 760 Lines beginning with ‘–’ are comments. To enable library profiling one has to uncomment the line -- library-profiling: False and change the value of the option to True: library-profiling: True This is somewhat confusing: an option can be in three different states: True, False, and commented out. ### Ticket 774 We can augment the tokenizer so that lines beginning with ‘>’ are left intact: tokeniseLine :: (LineNo, Indent, HasTabs, String) -> [Token] tokeniseLine (n0, i, t, l) = case split n0 l of (Span _ l':ss) -> Line n0 i t l' :ss cs -> cs where split _ "" = [] split n s@('>' : _) = [Line n i t s] split n s = case span (\c -> c /='}' && c /= '{') s of ("", '{' : s') -> OpenBracket n : split n s' (w , '{' : s') -> mkspan n w (OpenBracket n : split n s') ("", '}' : s') -> CloseBracket n : split n s' (w , '}' : s') -> mkspan n w (CloseBracket n : split n s') (w , _) -> mkspan n w [] mkspan n s ss | null s' = ss | otherwise = Span n s' : ss where s' = trimTrailing (trimLeading s) Is it a good / right solution? Am I overlooking anything? 2013-05-04 Sat: Roman has pointed out that Duncan Coutts announced he was rewriting the parser of .cabal files, so it probably doesn’t makes sense to fix the old parser if it’s going to be replaced by the new one. ### Ticket 757 Can’t reproduce with Firefox 20.0 (Linux/x86$_{\mathrm{64}}$). ### Ticket 735 I tried to reproduce this issue and I can’t. I tried to build NaturalSort-0.2.1 with GHC 7.4.1 and Cabal 1.16.03, and I would expect the build to fail due to this line in NaturalSort.cabal: Cabal-Version: >= 1.6 && < 1.9, but it does build, and the log file does contain some log info. No summary file has been produced though. I also tried to build cal3d, which on my machine fails due to a missing C library, but again the log file contains the error message. The summary file is generated when I try to install the library from Hackage but is not generated when I manually download the source and try to build the library locally, which confirms the behavior observed in Issue #1189. ### Ticket 676 The culprit of the problem was the function simplifyCondTree, which traversed CondTree in post-order, putting global options after the options defined in the branches of conditionals. I’ve changed that to pre-order traversal. Sent a pull request. Also, added a test demonstrating what’s changed. ### Ticket 885 Duplicate of Issue 774. 2013-05-04 Sat: Closed by Johan Tibell. ### Ticket 883 I don’t think this is a cabal bug. I tried to reproduce this behaviour: I downloaded monad-par-0.3.4.1 and mangled the file Control/Monad/Par/Scheds/Direct.hs so that its top looked as follows: {-# LANGUAGE RankNTypes, NamedFieldPuns, BangPatterns, ExistentialQuantification, CPP, ScopedTypeVariables, TypeSynonymInstances, MultiParamTypeClasses, GeneralizedNewtypeDeriving, PackageImports, ParallelListComp #-} # OPTIONS_GHC {- -Wall -fno-warn-name-shadowing -fno-warn-unused-do-bind #-} Running GHC directly on this file gives the following error message: Control/Monad/Par/Scheds/Direct.hs:6:18: parse error on input #' Building through cabal build produces the following error message: Control/Monad/Par/Scheds/Direct.hs:1:1: File name does not match module name: Saw: Main' Expected: Control.Monad.Par.Scheds.Direct' However, running GHC on Control/Monad/Par.hs produces an identical error message. I’m guessing that something like this is happening: • when GHC is run directly on Control/Monad/Par/Scheds/Direct.hs, it tries to parse a module declaration, encounters # OPTIONS_GHC, decides that the file doesn’t have a module header (in particular, defaults the module name to Main), tries to parse # OPTIONS_GHC and fails; • when GHC is run on Control/Monad/Par.hs, it expects to find a module named Control.Monad.Par.Scheds.Direct in the file Control/Monad/Par/Scheds/Direct.hs, but when it tries to parse it, it encounters # OPTIONS_GHC, and decides that the file doesn’t have a module header, defaults the module name to Main, and fails with a different error. This theory is supported by the following experiment: replace the pragmas and # OPTIONS_GHC above with some code, e.g., foo = 3. Then running GHC on Control/Monad/Par/Scheds/Direct.hs produces Control/Monad/Par/Scheds/Direct.hs:12:1: parse error on input module' and running GHC on Control/Monad/Par.hs produces Control/Monad/Par/Scheds/Direct.hs:1:1: File name does not match module name: Saw: Main' Expected: Control.Monad.Par.Scheds.Direct' The summary of the above is that this is not a cabal bug (and probably not a bug at all), and this ticket should be closed. 2013-05-04 Sat: Closed by Johan Tibell. ### Ticket 417 Is this still an issue? I see a comment in Cabal/Distribution/PackageDescription/Parse.hs: -- "build-depends" is a local field now. To be backwards -- compatible, we still allow it as a global field in old-style -- package description files and translate it to a local field by -- adding it to every non-empty section which makes me think that what Duncan is suggesting: or it should automatically propagate such global options into each buildable component. has already been implemented. 2013-05-05 Sun: It seems I was wrong, investigating. ## Cabal Bug Fixing II • Work on adding a –lowest-dependencies flag for cabal which builds a package against its lowest dependencies. This is useful for package maintainers to see whether the lower bounds of a package are still valid. An initial version is issued as a pull-request but the actual workings are still being tweaked. • Small bugfix making the –package-db flag accept relative paths ## Cabal Bug Fixing III I took a look at Ticket 1297. My first step was to grab the latest sources and run the tests. bash-3.2$ cabal test
Running 2 test suites...
Test suite package-tests: RUNNING...
Cabal test suite - testing cabal version 1.17.0

BuildTestSuiteDetailedV09:
: [Failed]
build failed!
expected: False
but got: True

BuildDeps/InternalLibrary2:
: [Failed]
executable should have linked with the internal library
expected: "myLibFunc internal"
but got: ""
BuildDeps/InternalLibrary3:
: [Failed]
executable should have linked with the internal library
expected: "myLibFunc internal"
but got: ""
BuildDeps/InternalLibrary4:
: [Failed]
executable should have linked with the installed library
expected: "myLibFunc installed"
but got: ""
PackageTests/CMain:
: [OK]

Test Cases   Total
Passed  20           20
Failed  4            4
Total   24           24
Test suite package-tests: FAIL
Test suite logged to: dist/test/Cabal-1.17.0-package-tests.log
Test suite unit-tests: RUNNING...
Test suite unit-tests: PASS
Test suite logged to: dist/test/Cabal-1.17.0-unit-tests.log
1 of 2 test suites (1 of 2 test cases) passed.

BuildTestSuiteDetailedV09 is a test which tests that Cabal testing is working correctly. It takes a small Cabal project with a test-suite stanza builds it and runs the tests in the test suite.

This test passes on linux so it looked like the problem was in the way cabal was running the tests. This turned out to be a red herring. I pulled the test out of the Cabal test suite and ran it on its own (obvious in hindsight). So the problem was in Cabal not in the way Cabal was being tested.

bash-3.2$../cabal/cabal-install/dist/build/cabal/cabal configure --enable-tests --package-db=../cabal/Cabal/dist/package.conf.inplace Warning: The package list for 'hackage.haskell.org' is 38 days old. Run 'cabal update' to get the latest list of available packages. Warning: my.cabal: A package using section syntax must specify at least 'cabal-version: >= 1.2'. Resolving dependencies... Warning: my.cabal: A package using section syntax must specify at least 'cabal-version: >= 1.2'. Configuring BuildTestSuiteDetailedV09-0.1... bash-3.2$ ../cabal/cabal-install/dist/build/cabal/cabal build
Warning: my.cabal: A package using section syntax must specify at least
'cabal-version: >= 1.2'.
Building BuildTestSuiteDetailedV09-0.1...
Preprocessing library BuildTestSuiteDetailedV09-0.1...
[1 of 1] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
In-place registering BuildTestSuiteDetailedV09-0.1...
Preprocessing test suite 'dummy' for BuildTestSuiteDetailedV09-0.1...
[1 of 1] Compiling Dummy            ( Dummy.hs, dist/build/Dummy.o )
ar: dist/build/Dummy.o: No such file or directory

But why was Cabal complaining about a missing Dummy.o when it had clearly just created it?

Sasha suggested using strace (http://en.wikipedia.org/wiki/Strace) but this didn’t exist on my Mac. But I did have dtrace (http://hints.macworld.com/article.php?story=20071031121823710 and http://chihungchan.blogspot.com/2009/05/which-process-deleted-my-file.html).

Dominics-MacBook-Pro:Dummy dom$dtrace -qn 'syscall::unlink:entry {printf("PID=%d, CMD=%s, FILE=%s\n", pid, curpsinfo->pr_psargs, copyinstr(arg0));}' dtrace: failed to initialize dtrace: DTrace requires additional privileges Dominics-MacBook-Pro:Dummy dom$ sudo dtrace -qn 'syscall::unlink:entry {printf("PID=%d, CMD=%s, FILE=%s\n", pid, curpsinfo->pr_psargs, copyinstr(arg0));}'
PID=16137, CMD=Emacs, FILE=/Users/dom/Dropbox/Private/.#Everything.org
PID=16137, CMD=Emacs, FILE=/Users/dom/Dropbox/Private/#Everything.org#
PID=82566, CMD=ar, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T//ar.ugfvvt
PID=82567, CMD=libtool, FILE=dist/build/libHSBuildTestSuiteDetailedV09-0.1.a
PID=82557, CMD=cabal, FILE=dist/build/libHSBuildTestSuiteDetailedV09-0.1.a
PID=82557, CMD=cabal, FILE=dist/build/libHSBuildTestSuiteDetailedV09-0.1_p.a
PID=82557, CMD=cabal, FILE=dist/build/libHSBuildTestSuiteDetailedV09-0.1-ghc7.6.2.dylib
PID=82557, CMD=cabal, FILE=dist/build/HSBuildTestSuiteDetailedV09-0.1.o
PID=82560, CMD=ghc, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T/ghc82560_0/ghc82560_0.c
PID=82560, CMD=ghc, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T/ghc82560_0/ghc82560_0.s
PID=82573, CMD=i686-apple-darwi, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T//ccnsroxB.s
PID=82562, CMD=i686-apple-darwi, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T//ccjXqsIc.s
PID=82571, CMD=ghc, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T/ghc82571_0/ghc82571_0.c
PID=82571, CMD=ghc, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T/ghc82571_0/ghc82571_0.s
PID=82576, CMD=as, FILE=dist/build/Dummy.o
PID=82577, CMD=ar, FILE=/var/folders/3p/m593dprn5snbjz6sc45c77f80000gn/T//ar.mlwmuT
PID=82557, CMD=cabal, FILE=dist/build/libdummy.a
PID=82557, CMD=cabal, FILE=dist/build/libdummy_p.a
PID=82557, CMD=cabal, FILE=dist/build/libdummy-ghc7.6.2.dylib
PID=82557, CMD=cabal, FILE=dist/build/dummy.o

I noticed I had a Dummy.o and a dummy.o. Also the name of the test-suite was dummy and the name of the module in the the Cabal package was Dummy. I tried changing the name of the test-suite stanza to Dummy. No luck but when I tried changing the name of the test-suite stanza to something completely different everything worked.

bash-3.2$../cabal/cabal-install/dist/build/cabal/cabal build Warning: my.cabal: A package using section syntax must specify at least 'cabal-version: >= 1.2'. Building BuildTestSuiteDetailedV09-0.1... Preprocessing library BuildTestSuiteDetailedV09-0.1... [1 of 1] Compiling Dummy ( Dummy.hs, dist/build/Dummy.o ) In-place registering BuildTestSuiteDetailedV09-0.1... Preprocessing test suite 'Urk' for BuildTestSuiteDetailedV09-0.1... [1 of 1] Compiling Dummy ( Dummy.hs, dist/build/Dummy.o ) In-place registering Urk-0.1... [1 of 1] Compiling Main ( dist/build/UrkStub/UrkStub-tmp/UrkStub.hs, dist/build/UrkStub/UrkStub-tmp/Main.o ) Linking dist/build/UrkStub/UrkStub ... ### Case Insensitive but Case Preserving It turns out that most Macs are configured like this. You can check as shown below. The absence of Case-sensitive in the Name field means the file system is case insensitive. bash-3.2$ diskutil info disk0s2
Device Identifier:        disk0s2
Device Node:              /dev/disk0s2
Part of Whole:            disk0
Device / Media Name:      Customer

Volume Name:              Macintosh HD
Escaped with Unicode:     Macintosh%FF%FE%20%00HD

Mounted:                  Yes
Mount Point:              /
Escaped with Unicode:     /

File System Personality:  Journaled HFS+
Type (Bundle):            hfs
Name (User Visible):      Mac OS Extended (Journaled)
Journal:                  Journal size 40960 KB at offset 0x1238b000
Owners:                   Enabled

Partition Type:           Apple_HFS
OS Can Be Installed:      Yes
Media Type:               Generic
Protocol:                 SATA
SMART Status:             Verified

Total Size:               499.4 GB (499418034176 Bytes) (exactly 975425848 512-Byte-Blocks)
Volume Free Space:        427.8 GB (427780079616 Bytes) (exactly 835507968 512-Byte-Blocks)
Device Block Size:        512 Bytes

Ejectable:                No

Whole:                    No
Internal:                 Yes
Solid State:              Yes

### Coda

Changing dummy to Dummy tickles the bug on linux so now we have an easily reproducible bug although it is not clear how to fix this: Ticket 1311.

April 30, 2013

## Introduction

Having shown how to use automated differentiation to estimate parameters in the case of linear regression let us now turn our attention to the problem of classification. For example, we might have some data about people’s social networking such as volume of twitter interactions and number of twitter followers together with a label which represents a human judgement about which one of the two individuals is more influential. We would like to predict, for a pair of individuals, the human judgement on who is more influential.

## Logistic Regression

We define the probability of getting a particular value of the binary label:

\displaystyle \begin{aligned} {\mathbb P}(y = 1 \mid \boldsymbol{x}; \boldsymbol{\theta}) &= h_{\boldsymbol{\theta}}(\boldsymbol{x}) \\ {\mathbb P}(y = 0 \mid \boldsymbol{x}; \boldsymbol{\theta}) &= 1 - h_{\boldsymbol{\theta}}(\boldsymbol{x}) \end{aligned}

where $\boldsymbol{x^{(i)}}$ and $\boldsymbol{\theta}$ are column vectors of size $m$

$\displaystyle h_{\boldsymbol{\theta}}(\boldsymbol{x}) = g(\boldsymbol{\theta}^T\boldsymbol{x})$

and $g$ is a function such as the logistic function $g(x) = 1 / (1 + e^{-x})$ or $\tanh$.

We can re-write this as:

$\displaystyle p(y \mid \boldsymbol{x} ; \boldsymbol{\theta}) = (h_{\boldsymbol{\theta}}(\boldsymbol{x}))^y(1 - h_{\boldsymbol{\theta}}(\boldsymbol{x}))^{1 - y}$

We wish to find the value of $\boldsymbol{\theta}$ that gives the maximum probability to the observations. We do this by maximising the likelihood. Assuming we have $n$ observations the likelihood is:

\displaystyle \begin{aligned} \mathcal{L}(\boldsymbol{\theta}) &= \prod_{i=1}^n p(y^{(i)} \mid {\boldsymbol{x}}^{(i)} ; \boldsymbol{\theta}) \\ &= \prod_{i=1}^n (h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)}))^{y^{(i)}} (1 - h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)}))^{1 - y^{(i)}} \end{aligned}

It is standard practice to maximise the log likelihood which will give the same maximum as log is monotonic.

\displaystyle \begin{aligned} \lambda(\boldsymbol{\theta}) &= \log \mathcal{L}(\boldsymbol{\theta}) \\ &= \sum_{i=1}^n {y^{(i)}}\log h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)}) + (1 - y^{(i)})\log (1 - h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)})) \end{aligned}

In order to maximize the cost function, we again use the method of steepest ascent: if $\boldsymbol{\theta}^i$ is a guess for the parameters of the model then we can improve the guess by stepping a small distance in the direction of greatest change.

$\displaystyle \boldsymbol{\theta}^{i+1} = \boldsymbol{\theta}^{i} - \gamma \nabla\mathcal{J}(\boldsymbol{\theta})$

$\gamma$ is some constant known in machine learning as the learning rate. It must be chosen to be large enough to ensure convergence within a reasonable number of steps but not so large that the algorithm fails to converge.

When the number of observations is high then the cost of evaluating the cost function can be high; as a cheaper alternative we can use stochastic gradient descent. Instead of taking the gradient with respect to all the observations, we take the gradient with respect to each observation in our data set. Of course if our data set is small we may have to use the data set several times to achieve convergence.

When the observations / training data are linearly separable then the magnitude of the parameters can grow without bound as the (parametized) logistic function then tends to the Heaviside / step function. Moreover, it is obvious that there can be more than one separaing hyperplane in this circumstance. To circumvent these infelicities, we instead maximize a penalized log likelihood function:

$\displaystyle \sum_{i=1}^n {y^{(i)}}\log h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)}) + (1 - y^{(i)})\log (1 - h_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)})) - \frac{\delta}{2}\|\boldsymbol{\theta}\|^2$

See Bishop and Mitchell for further details.

## Implementation

Some pragmas to warn us about potentially dangerous situations.

> {-# OPTIONS_GHC -Wall                    #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults  #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind #-}
>
> {-# LANGUAGE TupleSections #-}
>
> module Logistic ( betas
>                 , main
>                 , a
>                 , b
>                 , nSamples
>                 ) where


Modules from the automatic differentiation library.

> import Numeric.AD

> import qualified Data.Vector as V
> import Data.List
> import Text.Printf


Some modules from a random number generator library as we will want to generate some test data.

> import System.Random
> import Data.Random ()
> import Data.Random.Distribution.Beta
> import Data.RVar


Our model: the probability that $y$ has the label 1 given the observations $\boldsymbol{x}$.

> logit :: Floating a =>
>          a -> a
> logit x = 1 / (1 + exp (negate x))


For each observation, the log likelihood:

> logLikelihood :: Floating a => V.Vector a -> a -> V.Vector a -> a
> logLikelihood theta y x = y * log (logit z) +
>                           (1 - y) * log (1 - logit z)
>   where
>     z = V.sum $V.zipWith (*) theta x  > totalLogLikelihood :: Floating a => > V.Vector a -> > V.Vector a -> > V.Vector (V.Vector a) -> > a > totalLogLikelihood theta y x = (a - delta * b) / l > where > l = fromIntegral$ V.length y
>     a = V.sum $V.zipWith (logLikelihood theta) y x > b = (/2)$ V.sum $V.map (^2) theta  As before we can implement steepest descent using the grad function. > delTotalLogLikelihood :: Floating a => > V.Vector a -> > V.Vector (V.Vector a) -> > V.Vector a -> > V.Vector a > delTotalLogLikelihood y x = grad f > where > f theta = totalLogLikelihood theta > (V.map auto y) > (V.map (V.map auto) x) > > stepOnce :: Double -> > V.Vector Double -> > V.Vector (V.Vector Double) -> > V.Vector Double -> > V.Vector Double > stepOnce gamma y x theta = > V.zipWith (+) theta (V.map (* gamma)$ del theta)
>   where
>     del = delTotalLogLikelihood y x


Or even easier just use the library function gradientAscent!

> estimates :: (Floating a, Ord a) =>
>              V.Vector a ->
>              V.Vector (V.Vector a) ->
>              V.Vector a ->
>              [V.Vector a]
> estimates y x = gradientAscent $> \theta -> totalLogLikelihood theta > (V.map auto y) > (V.map (V.map auto) x)  Let’s try it out. First we need to generate some data. Rather arbitrarily let us create some populations from the beta distribution. > betas :: Int -> Double -> Double -> [Double] > betas n a b = > fst$ runState (replicateM n (sampleRVar (beta a b))) (mkStdGen seed)
>     where
>       seed = 0


We can plot the populations we wish to distinguish by sampling.

> a, b :: Double
> a          = 15
> b          = 6
> nSamples :: Int
> nSamples   = 100000
>
> sample0, sample1 :: [Double]
> sample0 = betas nSamples a b
> sample1 = betas nSamples b a


Note that in this case we could come up with a classification rule by inspecting the histograms. Furthermore, the populations overlap which means we will inevitably mis-classify some observations.

> mixSamples :: [Double] -> [Double] -> [(Double, Double)]
> mixSamples xs ys = unfoldr g ((map (0,) xs), (map (1,) ys))
>   where
>     g ([], [])         = Nothing
>     g ([],  _)         = Nothing
>     g ( _, [])         = Nothing
>     g ((x:xs), (y:ys)) = Just $(x, (y:ys, xs))  > createSample :: V.Vector (Double, Double) > createSample = V.fromList$ take 100 $mixSamples sample1 sample0  We create a model with one independent variables and thus two parameters. > actualTheta :: V.Vector Double > actualTheta = V.fromList [0.0, 1.0]  We initialise our algorithm with arbitrary values. > initTheta :: V.Vector Double > initTheta = V.replicate (V.length actualTheta) 0.1  Set the learning rate, the strength of the penalty term and the number of iterations. > gamma :: Double > gamma = 0.1 > > delta :: Floating a => a > delta = 1.0 > > nIters :: Int > nIters = 4000  Now we can run our example. For the constant parameter of our model (aka in machine learning as the bias) we ensure that the correspoding “independent variable” is always set to $1.0$. > vals :: V.Vector (Double, V.Vector Double) > vals = V.map (\(y, x) -> (y, V.fromList [1.0, x]))$ createSample

> main :: IO ()
> main = do
>   let u = V.map fst vals
>       v = V.map snd vals
>       hs = iterate (stepOnce gamma u v) initTheta
>       xs = V.map snd vals
>       theta =  head $drop nIters hs > theta' = head$ drop 100 $estimates u v initTheta > printf "Hand crafted descent: theta_0 = %5.3f, theta_1 = %5.3f\n" > (theta V.! 0) (theta V.! 1) > printf "Library descent: theta_0 = %5.3f, theta_1 = %5.3f\n" > (theta' V.! 0) (theta' V.! 1) > let predProbs = V.map (\x -> logit$ V.sum $V.zipWith (*) theta x) xs > mismatches = V.filter (> 0.5)$
>                    V.map abs $> V.zipWith (-) actuals preds > where > actuals = V.map fst vals > preds = V.map (\x -> fromIntegral$ fromEnum (x > 0.5))
>                           predProbs
>   let lActuals, lMisMatches :: Double
>       lActuals    = fromIntegral $V.length vals > lMisMatches = fromIntegral$ V.length mismatches
>   printf "%5.2f%% correct\n" > 100.0 * (lActuals - lMisMatches) / lActuals  And we get quite reasonable estimates: ghci> main Hand crafted descent: theta_0 = -2.071, theta_1 = 4.318 Library descent: theta_0 = -2.070, theta_1 = 4.319 97.00% correct  ## Regression and Automated Differentiation April 26, 2013 ## Introduction Automated differentiation was developed in the 1960’s but even now does not seem to be that widely used. Even experienced and knowledgeable practitioners often assume it is either a finite difference method or symbolic computation when it is neither. This article gives a very simple application of it in a machine learning / statistics context. ## Multivariate Linear Regression We model a dependent variable linearly dependent on some set of independent variables in a noisy environment. $\displaystyle y^{(i)} = \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)} + \epsilon^{(i)}$ where • $i$ runs from 1 to $n$, the number of observations; • $\epsilon^{(i)}$ are i.i.d. normal with mean $0$ and the same variance $\sigma^2$: $\epsilon^{(i)} \sim \mathcal{N} (0,\sigma^2)$; • For each $i$, $\boldsymbol{x^{(i)}}$ is a column vector of size $m$ and • $\boldsymbol{\theta}$ is a column vector also of size $m$. In other words: $\displaystyle p(y^{(i)} \mid \boldsymbol{x}^{(i)}; \boldsymbol{\theta}) = \frac{1}{\sqrt{2\pi}\sigma}\exp\big(\frac{-(y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2}{2\sigma^2}\big)$ We can therefore write the likelihood function given all the observations as: $\displaystyle \mathcal{L}(\boldsymbol{\theta}; X, \boldsymbol{y}) = \prod_{i = 1}^n \frac{1}{\sqrt{2\pi}\sigma}\exp\big(\frac{-(y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2}{2\sigma^2}\big)$ In order to find the best fitting parameters $\boldsymbol{\theta}$ we therefore need to maximize this function with respect to $\boldsymbol{\theta}$. The standard approach is to maximize the log likelihood which, since log is monotonic, will give the same result. \displaystyle \begin{aligned} \lambda(\boldsymbol{\theta}) &= \log \mathcal{L}(\boldsymbol{\theta}) \\ &= \sum_{i=1}^n \log \frac{1}{\sqrt{2\pi}\sigma}\exp\big(\frac{-(y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2}{2\sigma^2}\big) \\ &= n\log \frac{1}{\sqrt{2\pi}\sigma} - \frac{1}{2\sigma^2}\sum_{i=1}^n (y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2 \end{aligned} Hence maximizing the likelihood is the same as minimizing the (biased) estimate of the variance: $\displaystyle \frac{1}{n}\sum_{i=1}^n (y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2$ We can define a cost function: $\displaystyle \mathcal{J}(\boldsymbol{\theta}) = \frac{1}{2n}\sum_{i=1}^n (y^{(i)} - \boldsymbol{\theta}^{T}\boldsymbol{x}^{(i)})^2$ Clearly minimizing this will give the same result. The constant $1/2$ is to make the manipulation of the derivative easier. In our case, this is irrelevant as we are not going to derive the derivative explicitly but use automated differentiation. In order to mininize the cost function, we use the method of steepest ascent (or in this case descent): if $\boldsymbol{\theta}^i$ is a guess for the parameters of the model then we can improve the guess by stepping a small distance in the direction of greatest change. $\displaystyle \boldsymbol{\theta}^{i+1} = \boldsymbol{\theta}^{i} - \gamma \nabla\mathcal{J}(\boldsymbol{\theta})$ $\gamma$ is some constant known in machine learning as the learning rate. It must be chosen to be large enough to ensure convergence within a reasonable number of steps but not so large that the algorithm fails to converge. When the number of observations is high then the cost of evaluating the cost function can be high; as a cheaper alternative we can use stochastic gradient descent. Instead of taking the gradient with respect to all the observations, we take the gradient with respect to each observation in our data set. Of course if our data set is small we may have to use the data set several times to achieve convergence. ## Implementation Some pragmas to warn us about potentially dangerous situations. > {-# OPTIONS_GHC -Wall #-} > {-# OPTIONS_GHC -fno-warn-name-shadowing #-} > {-# OPTIONS_GHC -fno-warn-type-defaults #-}  > module Linear where  Modules from the automatic differentiation library. > import Numeric.AD > import Numeric.AD.Types  > import qualified Data.Vector as V  Some modules from a random number generator library as we will want to generate some test data. > import Data.Random () > import Data.Random.Distribution.Normal > import Data.Random.Distribution.Uniform > import Data.RVar  Our model: the predicted value of $y$ is $\hat{y}$ given the observations $\boldsymbol{x}$. > yhat :: Floating a => > V.Vector a -> > V.Vector a -> a > yhat x theta = V.sum V.zipWith (*) theta x


For each observation, the “cost” of the difference between the actual value of $y$ and its predicted value.

> cost :: Floating a =>
>         V.Vector a ->
>         a ->
>         V.Vector a
>         -> a
> cost theta y x = 0.5 * (y - yhat x theta)^2


To find its gradient we merely apply the operator grad.

> delCost :: Floating a =>
>            a ->
>            V.Vector a ->
>            V.Vector a ->
>            V.Vector a
> delCost y x = grad $\theta -> cost theta (auto y) (V.map auto x)  We can use the single observation cost function to define the total cost function. > totalCost :: Floating a => > V.Vector a -> > V.Vector a -> > V.Vector (V.Vector a) > -> a > totalCost theta y x = (/l)$ V.sum $V.zipWith (cost theta) y x > where > l = fromIntegral$ V.length y


Again taking the derivative is straightforward.

> delTotalCost :: Floating a =>
>                 V.Vector a ->
>                 V.Vector (V.Vector a) ->
>                 V.Vector a ->
>                 V.Vector a
> delTotalCost y x = grad f
>   where
>     f theta = totalCost theta (V.map auto y) (V.map (V.map auto) x)


Now we can implement steepest descent.

> stepOnce :: Double ->
>             V.Vector Double ->
>             V.Vector (V.Vector Double) ->
>             V.Vector Double ->
>             V.Vector Double
> stepOnce gamma y x theta =
>   V.zipWith (-) theta (V.map (* gamma) $del theta) > where > del = delTotalCost y x  > stepOnceStoch :: Double -> > Double -> > V.Vector Double -> > V.Vector Double -> > V.Vector Double > stepOnceStoch gamma y x theta = > V.zipWith (-) theta (V.map (* gamma)$ del theta)
>   where
>     del = delCost y x


Let’s try it out. First we need to generate some data.

> createSample :: Double -> V.Vector Double -> IO (Double, V.Vector Double)
> createSample sigma2 theta = do
>   let l = V.length theta
>   x <- V.sequence $V.replicate (l - 1)$ sampleRVar stdUniform
>   let mu = (theta V.! 0) + yhat x (V.drop 1 theta)
>   y <- sampleRVar $normal mu sigma2 > return (y, x)  We create a model with two independent variables and thus three parameters. > actualTheta :: V.Vector Double > actualTheta = V.fromList [0.0, 0.6, 0.7]  We initialise our algorithm with arbitrary values. > initTheta :: V.Vector Double > initTheta = V.replicate 3 0.1  We give our model an arbitrary variance. > sigma2 :: Double > sigma2 = 0.01  And set the learning rate and the number of iterations. > nSamples, nIters:: Int > nSamples = 100 > nIters = 2000 > gamma :: Double > gamma = 0.1  Now we can run our example. For the constant parameter of our model (aka in machine learning as the bias) we ensure that the correspoding “independent variable” is always set to $1.0$. > main :: IO () > main = do > vals <- V.sequence$
>           V.replicate nSamples $> createSample sigma2 actualTheta > let y = V.map fst vals > x = V.map snd vals > x' = V.map (V.cons 1.0) x > hs = iterate (stepOnce gamma y x') initTheta > update theta = V.foldl f theta$ V.zip y x'
>         where
>           f theta (y, x) = stepOnceStoch gamma y x theta
>   putStrLn $show$ take 1 $drop nIters hs > let f = foldr (.) id$ replicate nSamples update
>   putStrLn $show$ f initTheta


And we get quite reasonable estimates for the parameter.

ghci> main
[fromList [-8.34526742975572e-4,0.6024033722648041,0.69483650585735]]
fromList [7.095387239884274e-5,0.6017197904731632,0.694335002002961]


## Comonads, Life and Klein Bottles

February 23, 2013

It’s part of Haskell folklore that the archetypal example for comonads is Conway’s game of life. Here’s an implementation using arrays.

> {-# OPTIONS_GHC -Wall                    #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults  #-}

> import Diagrams.Prelude
> import Diagrams.Backend.Cairo.CmdLine
> import Data.Array
> import Data.List


> class Comonad c where
>   coreturn :: c a -> a
>   (=>>) :: c a -> (c a -> b) -> c b


This will become our two dimensional grid when we concretize it.

> data PointedArray i a = PointedArray i (Array i a)
>   deriving Show


As usual we make this into a functor by using the functor instance of the underlying array.

> instance Ix i => Functor (PointedArray i) where
>   fmap f (PointedArray i a) = PointedArray i (fmap f a)


An array with a distinguished element is a comonad in which the cobind updates each element of the array simultaneously.

> instance Ix i => Comonad (PointedArray i) where
>   coreturn (PointedArray i a) = a!i
>   (PointedArray i a) =>> f =
>     PointedArray i (listArray (bounds a)
>                    (map (f . flip PointedArray a) (range $bounds a)))  Let’s have a small grid size to demonstrate the so called glider. > mMax, nMax :: Int > mMax = 5 > nMax = 5  A cell is either dead or alive. > data Liveness = > Dead | Alive > deriving (Eq, Show)  Let’s be explicit about our neighbours. > data Neighbours a = Neighbours { north :: a > , northEast :: a > , east :: a > , southEast :: a > , south :: a > , southWest :: a > , west :: a > , northWest :: a > } > deriving Show > > toList :: Neighbours a -> [a] > toList (Neighbours x1 x2 x3 x4 x5 x6 x7 x8) = x1:x2:x3:x4:x5:x6:x7:x8:[] > > type NumNeighbours a = Int -> Int -> PointedArray (Int, Int) a > -> Neighbours a > > numNeighbours :: NumNeighbours Liveness > -> PointedArray (Int, Int) Liveness > -> Int > numNeighbours ns p = length$
>                      filter (== Alive) $> toList$
>                      ns mMax nMax p


Now we can implement the rules.

> f :: NumNeighbours Liveness ->
>      PointedArray (Int, Int) Liveness ->
>      Liveness
> f ns p@(PointedArray (i, j) x)
>   |  x!(i, j) == Alive && (numNeighbours ns p < 2)
> f ns p@(PointedArray (i, j) x)
>   |  x!(i, j) == Alive && (numNeighbours ns p elem [2, 3])
>   = Alive
> f ns p@(PointedArray (i, j) x)
>   |  x!(i, j) == Alive && (numNeighbours ns p > 3)
> f ns p@(PointedArray (i, j) x)
>   |  x!(i, j) == Dead && (numNeighbours ns p == 3)
>   = Alive
> f _  (PointedArray (i, j) x)
>   = x!(i, j)


Let’s create a glider which will move around our manifold.

> glider :: PointedArray (Int, Int) Liveness
> glider = PointedArray (0, 0) xs
>   where
>     ys = listArray ((0, 0), (mMax - 1, nMax - 1)) $repeat Dead > xs = ys // [ ((2, 4), Alive) > , ((3, 3), Alive) > , ((1, 2), Alive) > , ((2, 2), Alive) > , ((3, 2), Alive) > ]  We can’t have an infinite grid with an array but we can make our game of life take place on a torus rather than the plane. This way we don’t have problems with boundary conditions. > neighbours :: Int -> Int -> PointedArray (Int, Int) a -> Neighbours a > neighbours mMax nMax (PointedArray (i, j) x) = > Neighbours > { > north = x!(i, (j + 1) mod nMax) > , northEast = x!((i + 1) mod mMax, (j + 1) mod nMax) > , east = x!((i + 1) mod mMax, j) > , southEast = x!((i + 1) mod mMax, (j - 1) mod nMax) > , south = x!(i, (j - 1) mod nMax) > , southWest = x!((i - 1) mod mMax, (j - 1) mod nMax) > , west = x!((i - 1) mod mMax, j) > , northWest = x!((i - 1) mod mMax, (j + 1) mod nMax) > }  We can see that the glider reappears at the same place at iterations 21 and 41. We don’t have to use a torus. For example, we can use a Klein bottle which is non-orientable surface. > neighboursKlein :: Int -> Int -> PointedArray (Int, Int) a > -> Neighbours a > neighboursKlein mMax nMax (PointedArray (i, j) x) = > Neighbours > { > north = north' i j > , northEast = northEast' i j > , east = x!((i + 1) mod mMax, j) > , southEast = southEast' i j > , south = south' i j > , southWest = southWest' i j > , west = x!((i - 1) mod mMax, j) > , northWest = northWest' i j > } > where > north' i j > | j < nMax - 1 = x!(i, j + 1) > | otherwise = x!(mMax - 1 - i, 0) > northEast' i j > | j < nMax - 1 = x!((i + 1) mod mMax, j + 1) > | otherwise = x!(mMax - 1 - (i + 1) mod mMax, 0) > southEast' i j > | j > 0 = x!((i + 1) mod mMax, j - 1) > | otherwise = x!(mMax - 1 - (i + 1) mod mMax, nMax - 1) > south' i j > | j > 0 = x!(i, j - 1) > | otherwise = x!(mMax - 1 - i, nMax - 1) > southWest' i j > | j > 0 = x!((i - 1) mod mMax, j - 1) > | otherwise = x!(mMax - 1 - (i - 1) mod mMax, nMax - 1) > northWest' i j > | j < nMax - 1 = x!((i - 1) mod mMax, j + 1) > | otherwise = x!(mMax - 1 - (i - 1) mod mMax, 0)  We can see that the glider reappears in the same place at iteration 21 with its left and right sides swapped and that it reappears in the same place at iteration 41 with its original handedness. If you wish to run the code yourself, you will need to do something like this: > testGrids :: [PointedArray (Int, Int) Liveness] > testGrids = take 10$ iterate (=>> (f neighbours)) glider


It is somewhat difficult to determine what is going on from the raw data structures themselves; it is easier to see what is happening using some form of diagram. If you wish to do this, the full code and text are available here.

## Parallelising Path Dependent Options in Haskell

February 10, 2013

## Path Dependency

Now we are able to price options using parallelism, let us consider a more exotic financial option.

Let us suppose that we wish to price an Asian call option. The payoff at time $T$ is

$\displaystyle {\Bigg(\frac{\Sigma_{i=1}^n x_{t_i}}{n} -k\Bigg)}^+$

Thus the payoff depends not just on the value of the underlying $x$ at time $T$ but also on the path taken. We do not need to know the entire path, just the average at discrete points in time. Thus the payoff is a function of time, the value of the underlying and the average of the underlying sampled at discrete points in time $z(x,a,t)$.

We can rewrite this equation as a recursive relation:

$\displaystyle a_n = a_{n-1} + \frac{x_n -a_{n-1}}{n}$

If $t_n$ is the $n$-th sampling time then for small $\epsilon$ let:

\begin{aligned}t^+ &= t_n + \epsilon \\t^- &= t_n - \epsilon\end{aligned}

Now we can write the equation for the value of the average just before and after the sampling time as:

$\displaystyle a(x, t^+) = a(x, t^-) + \frac{x - a(x, t^-)}{n}$

Since there is no arbitrage, the value of the option cannot change as it crosses the sampling date:

$\displaystyle z(x, a^+, t^+) = z(x, a^-, t^-)$

Substituting in:

$\displaystyle z(x, a + (x - a)/n, t^+) = z(x, a, t^-)$

But we have a problem, we do not know the value of $a$. Away from the sampling times there is no dependence on $a$ as its value cannot change. We already know how to step back in time in this case: we use the pricer we have already developed.

So let us assume a value for $a$. Then we can diffuse backwards from the final payoff but when we reach a sampling date, we reset the values on the grid using the interfacing formula above.

In summary, we assume a set of values for $a$ and then diffuse backwards for each pricer with that $a$; at each sampling time, we reset the values of the pricers and continue until we reach the last sampling time. At this point, $x = a$ so we diffuse one pricer back to the start time which gives us the price of the option for any given $x$ on our grid.

Our usual imports and options.

> {-# LANGUAGE FlexibleContexts          #-}
> {-# LANGUAGE TypeOperators             #-}
> {-# LANGUAGE NoMonomorphismRestriction #-}
>
> {-# OPTIONS_GHC -Wall                    #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults  #-}
>
> import Data.Array.Repa as Repa hiding ((++), map)
> import AsianDiagram
> import Diagrams.Prelude((===))
> import Diagrams.Backend.Cairo.CmdLine
> import Text.Printf
> import Data.List


First some constants for the payoff and the pricer. We make the space grid deliberately coarse so we can draw diagrams showing the grid before and after interfacing.

> r, sigma, k, t, xMax, aMax, deltaX, deltaT, deltaA:: Double
> m, n, p :: Int
> r = 0.05
> sigma = 0.2
> k = 50.0
> t = 3.0
> m = 10
> p = 10
> xMax = 150
> deltaX = xMax / (fromIntegral m)
> aMax = 150
> deltaA = aMax / (fromIntegral p)
> n = 100
> deltaT = t / (fromIntegral n)


We take the times at which do our Asian observations and calculate the number of steps between each observation including the initial and terminal times.

> asianTimes :: [Int]
> asianTimes = map (\x -> floor $x*(fromIntegral n)/t) [1.5,2.0,2.5] > > numSteps :: [Int] > numSteps = snd$ mapAccumL (\s x -> (x, x - s)) 0 times
>              where times = asianTimes ++ [n]


As before we can define a single pricer that updates the grid over one time step and at multiple points in space using the Explicit Euler method. We parameterize the upper and lower boundaries $z(0,t)$ and $\lim_{x \to \infty}z(x,t)$ as we will want to use our pricer not just for a vanilla call option.

> type BoundaryCondition = Array D DIM1 Double -> Double
>
> singleUpdater :: BoundaryCondition ->
>                   BoundaryCondition ->
>                   Array D DIM1 Double ->
>                   Array D DIM1 Double
> singleUpdater lb ub arr = traverse arr id f
>   where
>     Z :. m = extent arr
>     f _get (Z :. ix) | ix == 0   = lb arr
>     f _get (Z :. ix) | ix == m-1 = ub arr
>     f  get (Z :. ix)             = a * get (Z :. ix-1) +
>                                    b * get (Z :. ix) +
>                                    c * get (Z :. ix+1)
>       where
>         a = deltaT * (sigma^2 * x^2 - r * x) / 2
>         b = 1 - deltaT * (r  + sigma^2 * x^2)
>         c = deltaT * (sigma^2 * x^2 + r * x) / 2
>         x = fromIntegral ix


Again we can extend this to update many pricers on one time step and multiple points in space.

> multiUpdater :: Source r Double =>
>                 BoundaryCondition ->
>                 BoundaryCondition ->
>                 Array r DIM2 Double ->
>                 Array D DIM2 Double
> multiUpdater lb ub arr = fromFunction (extent arr) f
>      where
>        f :: DIM2 -> Double
>        f (Z :. ix :. jx) = (singleUpdater lb ub x)!(Z :. ix)
>          where
>            x :: Array D DIM1 Double
>            x = slice arr (Any :. jx)


We define the boundary condition at the maturity date of our Asian option. Notice for each pricer the boundary condition is constant, this being determined by the value of the Asian payoff.

> priceAtTAsian :: Array U DIM2 Double
> priceAtTAsian = fromListUnboxed (Z :. m+1 :. p+1)
>                 [ max 0 (deltaA * (fromIntegral l) - k)
>                 | _j <- [0..m],
>                    l <- [0..p]
>                 ]


With this we can step backwards in time for any number of timesteps.

> stepMulti :: Monad m =>
>              Int ->
>              BoundaryCondition ->
>              BoundaryCondition ->
>              Array U DIM2 Double ->
>              m (Array U DIM2 Double)
> stepMulti n lb ub = updaterM
>   where
>     updaterM :: Monad m => Array U DIM2 Double ->
>                            m (Array U DIM2 Double)
>     updaterM = foldr (>=>) return updaters
>       where
>         updaters = replicate n (computeP . multiUpdater lb ub)


## Boundary Conditions

But now we are stuck. What are the boundary conditions for each of the pricers after we have done our interfacing? We follow many practioners and assume that the underlying is linear in this area. Thus in the Black-Scholes equation

$\displaystyle \frac{\partial z}{\partial t} + \frac{1}{2}\sigma^2 x^2 \frac{\partial^2 z}{\partial x^2} + r x\frac{\partial z}{\partial x} - rz = 0$

we set the second derivative to $0$. Thus we can use a forward method at the boundary to solve the equation below.

$\displaystyle \frac{\partial z}{\partial t} + r x\frac{\partial z}{\partial x} - rz = 0$

When $x=0$, we have a simpler situation.

$\displaystyle \frac{\partial z(0,t)}{\partial t} - rz = 0$

The corresponding difference equation for the lower boundary is:

$\displaystyle \frac{z_0^{n+1} - z_0^n}{\Delta t} -rz_0^n = 0$

Which is easily represented in Haskell (rembering we are stepping backwards in time).

> lBoundaryUpdater :: BoundaryCondition
> lBoundaryUpdater arr = x - deltaT * r * x
>   where
>     x = arr!(Z :. (0 :: Int))


The corresponding difference equation for the upper boundary is:

$\displaystyle \frac{z^{n+1}_m - z^n_m}{\Delta t} + r(m\Delta x)\frac{z^n_m - z^n_{m-1}}{\Delta x} - rz^n_m = 0$

Re-arranging

$\displaystyle \frac{z^{n+1}_m - z^n_m}{\Delta t} + r (m-1) z^n_m - r m z^n_{m-1} = 0$

We can write this in Haskell as follows (again remembering we are stepping backwards in time).

> uBoundaryUpdater :: BoundaryCondition
> uBoundaryUpdater arr = x - deltaT * r * (a - b)
>   where
>     Z :. m = extent arr
>     x = arr!(Z :. m - 1)
>     y = arr!(Z :. m - 2)
>     a = x * fromIntegral (m - 1)
>     b = y * fromIntegral m


## Interfacing

We interface by linear interpolation if value we require lies between 2 points on our grid otherwise we use linear extrapolation.

> interface :: Int -> Array U DIM2 Double ->
>              Array D DIM2 Double
> interface n grid = traverse grid id (\_ sh -> f sh)
>   where
>     (Z :. _iMax :. jMax) = extent grid
>     f (Z :. i :. j) = inter
>       where
>         x       = deltaX * (fromIntegral i)
>         aPlus   = deltaA * (fromIntegral j)
>         aMinus  = aPlus + (x - aPlus) / (fromIntegral n)
>         jLower  = if k > jMax - 1 then jMax - 1 else k
>                     where k = floor $aMinus / deltaA > jUpper = if (jLower == jMax - 1) > then jMax - 1 > else jLower + 1 > aLower = deltaA * (fromIntegral jLower) > prptn = (aMinus - aLower) / deltaA > vLower = grid!(Z :. i :. jLower) > vUpper = grid!(Z :. i :. jUpper) > inter = vLower + prptn * (vUpper - vLower)  ## Example Now we are in a position to give an example of pricing our option with a few helper functions > showD :: Double -> String > showD = printf "%.2f" > > showArrD1 :: Array U DIM1 Double -> String > showArrD1 = intercalate ", " . map showD . toList > > showSlices :: String -> Array U DIM2 Double -> IO () > showSlices message prices = do > putStrLn ('\n' : message) > > let (Z :. _i :. j) = extent prices > slicesD = map (\m -> slice prices (Any :. m)) [0..j-1] > > slices <- mapM computeP slicesD > mapM_ (putStrLn . showArrD1) slices > > diagonal :: Source a Double => > Array a DIM2 Double -> > Array D DIM2 Double > diagonal arr = traverse arr g f > where > f :: (DIM2 -> Double) -> DIM2 -> Double > f get (Z :. ix :. _) = get (Z :. ix :. ix) > > g :: DIM2 -> DIM2 > g (Z :. ix :. iy) = Z :. (min ix iy) :. (1 :: Int) > > main :: IO () > main = do putStrLn "\nAsianing times" > putStrLn$ show asianTimes
>
>           let lb = lBoundaryUpdater
>               ub = uBoundaryUpdater
>
>           showSlices "Initial pricers" priceAtTAsian


Now instead of stepping all the way back to the initial time of the option, we only step back to just before the last observation.

>           grid3b <- stepMulti (numSteps!!3) lb ub priceAtTAsian
>           showSlices ("Just before 3") grid3b


The diagram below shows our grid just before we make the last observation. The $x$-axis is the value of the underlying and the $y$-axis is the value of the Asian value. Both have been normalised to 1.0. That is the value 1.0 represents 150.0 (our maximum value on the grid for both the underlying and the Asian / auxiliary variable).

>           grid3a <- computeP $interface 3 grid3b > showSlices ("Just after 3") grid3a  The diagram below shows our grid after before we make the last observation i.e., after interfacing. Again, the $x$-axis is the value of the underlying and the $y$-axis is the value of the Asian value. Both have been normalised to 1.0. That is the value 1.0 represents 150.0 (our maximum value on the grid for both the underlying and the Asian / auxiliary variable). And then step backwards with the new final boundary condition. > grid2b <- stepMulti (numSteps!!2) lb ub grid3a > showSlices ("Just before 2") grid2b > > grid2a <- computeP$ interface 2 grid2b
>           showSlices ("Just after 2") grid2a


Again we step backwards in time with a new final boundary condition.

>           grid1b <- stepMulti (numSteps!!1) lb ub grid2a
>           showSlices ("Just before 1") grid1b


Finally we step all the way back to the time at which we wish to know the price. Here the interface condition is just $x=a$ so we can take the diagonal of our grid and diffuse backwards using a single pricer.

>           grid1a <- computeP $diagonal grid1b > showSlices ("Just after 1") grid1a > grid1b <- stepMulti (numSteps!!0) lb ub grid1a > showSlices "Final pricer" grid1b  ## Option Pricing Using Haskell Parallel Arrays January 5, 2013 Can we get better performance for pricing financial options in Haskell? First let us translate our pricer using the Explicit Euler Method to use repa. {-# LANGUAGE FlexibleContexts, TypeOperators #-} {-# OPTIONS_GHC -Wall -fno-warn-name-shadowing -fno-warn-type-defaults #-} import Data.Array.Repa as Repa import Data.Array.Repa.Eval import Control.Monad r, sigma, k, t, xMax, deltaX, deltaT :: Double m, n, p :: Int r = 0.05 sigma = 0.2 k = 50.0 t = 3.0 m = 80 p = 2 xMax = 150 deltaX = xMax / (fromIntegral m) n = 800 deltaT = t / (fromIntegral n) data PointedArrayU a = PointedArrayU Int (Array U DIM1 a) deriving Show singleUpdaterP :: PointedArrayU Double -> Double singleUpdaterP (PointedArrayU j _x) | j == 0 = 0.0 singleUpdaterP (PointedArrayU j _x) | j == m = xMax - k singleUpdaterP (PointedArrayU j x) = a * x! (Z :. j-1) + b * x! (Z :. j) + c * x! (Z :. j+1) where a = deltaT * (sigma^2 * (fromIntegral j)^2 - r * (fromIntegral j)) / 2 b = 1 - deltaT * (r + sigma^2 * (fromIntegral j)^2) c = deltaT * (sigma^2 * (fromIntegral j)^2 + r * (fromIntegral j)) / 2 priceAtTP :: PointedArrayU Double priceAtTP = PointedArrayU 0 (fromListUnboxed (Z :. m+1) [ max 0 (deltaX * (fromIntegral j) - k) | j <- [0..m] ]) coBindU :: (Source U a, Source U b, Target U b, Monad m) => PointedArrayU a -> (PointedArrayU a -> b) -> m (PointedArrayU b) coBindU (PointedArrayU i a) f = computeP newArr >>= return . PointedArrayU i where newArr = traverse a id g where g _get (Z :. j) = f$ PointedArrayU j a

testN :: Int -> IO (PointedArrayU Double)
testN n =  h priceAtTP
where
h = foldr (>=>) return
(take n $Prelude.zipWith flip (repeat coBindU) (repeat singleUpdaterP)) So far so good but this has not bought us very much over using Data.Array or Data.Vector. Morevoer we have not been able to use the costate comonad because of the restrictions on types imposed by repa. Let us re-write the above without even attempting to mimic the cobind operator. Thus we no longer need our pointed array. singleUpdater :: Array D DIM1 Double -> Array D DIM1 Double singleUpdater a = traverse a id f where Z :. m = extent a f _get (Z :. ix) | ix == 0 = 0.0 f _get (Z :. ix) | ix == m-1 = xMax - k f get (Z :. ix) = a * get (Z :. ix-1) + b * get (Z :. ix) + c * get (Z :. ix+1) where a = deltaT * (sigma^2 * (fromIntegral ix)^2 - r * (fromIntegral ix)) / 2 b = 1 - deltaT * (r + sigma^2 * (fromIntegral ix)^2) c = deltaT * (sigma^2 * (fromIntegral ix)^2 + r * (fromIntegral ix)) / 2 priceAtT :: Array U DIM1 Double priceAtT = fromListUnboxed (Z :. m+1) [max 0 (deltaX * (fromIntegral j) - k) | j <- [0..m]] Now suppose we want to price a spot ladder then we would like to apply the same update function to slightly modified payoffs (the terminal boundary condition). First let’s utilise our single update function to work over two dimensional grid. multiUpdater :: Source r Double => Array r DIM2 Double -> Array D DIM2 Double multiUpdater a = fromFunction (extent a) f where f :: DIM2 -> Double f (Z :. ix :. jx) = (singleUpdater x)!(Z :. ix) where x :: Array D DIM1 Double x = slice a (Any :. jx) We define our spot ladder on a call to be many calls each with a strike differing by 10 basis points (a basis point is one hundredth of one percent). priceAtTMulti :: Array U DIM2 Double priceAtTMulti = fromListUnboxed (Z :. m+1 :. p+1) [ max 0 (deltaX * (fromIntegral j) - (k + (fromIntegral l)/1000.0)) | j <- [0..m] , l <- [0..p] ] Now we can test our spot ladder pricer. Note that we force the evaluation of our two dimensional array at each time step. If we do not do that then we end up with a leak as presumably repa tries to fuse many updates. Moreover fusing updates across time steps does not really buy us anything. testMulti :: IO (Array U DIM2 Double) testMulti = updaterM priceAtTMulti where updaterM :: Monad m => Array U DIM2 Double -> m (Array U DIM2 Double) updaterM = foldr (>=>) return (replicate n (computeP . multiUpdater)) pickAtStrike :: Monad m => Int -> Array U DIM2 Double -> m (Array U DIM1 Double) pickAtStrike n t = computeP$ slice t (Any :. n :. All)

main :: IO ()
main = do t <- testMulti
vStrikes <- pickAtStrike 27 t
putStrLn $show vStrikes  I am planning to write a blog post on performance but for now you can get some idea of how well this performs by reading this. ## Plotting a Distribution November 4, 2012 One way of visualizing a distribution is to take lots of samples for it and plot the resulting histogram. To do this we use the diagrams package. {-# LANGUAGE TupleSections #-} import Diagrams.Prelude import Diagrams.Backend.Cairo.CmdLine import Data.Colour (withOpacity) To generate the actual samples (in this case from the beta distribution) we use the random-fu and rvar packages. import Data.Random () import Data.Random.Distribution.Beta import Data.RVar We import some other standard modules and define some constants. import System.Random import Data.List import qualified Data.IntMap as IntMap import Control.Monad.State import Text.Printf tickSize = 0.1 nCells = 100 a = 15 b = 6 nSamples = 100000 cellColour = blue withOpacity 0.5 First we define a background on which we are going to plot our histogram and also a function to draw the x-axis. background = rect 1.1 1.1 # translate (r2 (0.5, 0.5)) ticks xs = (mconcat$ map tick xs)  <> line
where
maxX   = maximum xs
line   = fromOffsets [r2 (maxX, 0)]
tSize  = maxX / 100
tick x = endpt # translate tickShift
where
tickShift = r2 (x, 0)
endpt     = topLeftText (printf "%.2f" x) # fontSize (tSize * 2) <>
circle tSize # fc red  # lw 0

We sample from the beta distribution with parameters $a$ and $b$ and put these into a histogram.

betas :: Int -> Double -> Double -> [Double]
betas n a b =
fst $runState (replicateM n (sampleRVar (beta a b))) (mkStdGen seed) where seed = 0 histogram :: Int -> [Double] -> IntMap.IntMap Int histogram nCells xs = foldl' g emptyHistogram xs where g h x = IntMap.insertWith (+) (makeCell nCells x) 1 h emptyHistogram = IntMap.fromList$ zip [0 .. nCells - 1] (repeat 0)
makeCell m     = floor . (* (fromIntegral m))

Finally we can render the histogram by drawing blue rectangles with a white border.

hist xs = scaleX sX . scaleY sY . position $hist' where ysmax = fromInteger . ceiling$ maximum xs
ysmin = fromInteger . floor $minimum xs xsmax = fromIntegral$ length xs
xsmin = 0.0
sX = 1 / (xsmax - xsmin)
sY = 1 / (ysmax - ysmin)
hist' = zip (map p2 $map (,0)$
map fromInteger [0..]) (map (cell 1) xs)
cell w h = alignB $rect w h # fcA cellColour # lc white # lw 0.001 And now we can draw our histogram. test tickSize nCells a b nSamples = ticks [0.0, tickSize..1.0] <> hist xs <> background where xs = IntMap.elems$
IntMap.map fromIntegral $histogram nCells$
betas nSamples a b

main :: IO ()
main = defaultMain $test tickSize nCells a b nSamples ## The Metropolis Algorithm August 15, 2012 ## A Simple Example In Section 7.1 of Doing Bayesian Data Analysis, John Kruschke gives a simple example of the Metropolis algorithm, in which we generate samples from a distribution without knowing the distribution itself. Of course the example is contrived as we really do know the distribtion. In the particular example, ${\cal P}(X = i) = i/k$ for i = 1…n where n is some fixed natural number and k is a normalising constant. Here’s the algorithm in Haskell. We use the random-fu package and the rvar package for random variables and the random package to supply random numbers. {-# LANGUAGE ScopedTypeVariables, NoMonomorphismRestriction #-} import Data.Random import Data.RVar import System.Random import Control.Monad.State import Data.List We pick an explicit seed and set n = 7 (in Kruschke’s example this is the number of islands that a politician visits). seed :: Int seed = 2 numIslands :: Int numIslands = 7 And we pick an arbitrary number of jumps to try in the Metropolis algorithm. n = 11112 We generate proposed jumps which are either one step up or one step down. proposedJumps :: Int -> [Int] proposedJumps seed = map f$ fst $runState (replicateM n$ sampleRVar $uniform False True) (mkStdGen seed) where f False = negate 1 f True = 1 And we generate samples from the uniform distribution on [0, 1] which will allow us to determine whether to accept or reject a proposed jump. acceptOrRejects :: Int -> [Double] acceptOrRejects seed = fst$ runState (replicateM n $sampleRVar$ uniform 0 1) (mkStdGen seed)

We pretend we only know a measure of how often we pick a given number but not the normalising constant to make this measure a probability measure.

p n | n >= 1 && n <= numIslands = n
| otherwise                 = 0

We define a function which defines one step of the Metropolis algorithm.

f currentPosition (proposedJump, acceptOrReject) =
if acceptOrReject < probAccept
then
currentPosition + proposedJump
else
currentPosition
where
probAccept = min 1 (pNew / pCur)
pNew = fromIntegral $currentPosition + proposedJump pCur = fromIntegral currentPosition Let’s try this out with a somewhat arbitrary burn in period starting at position 3. runMC seed = map (/ total) numVisits where total = sum numVisits numVisits = map (\j -> fromIntegral$ length $filter (==j)$ xs)
[1 .. numIslands]
xs        = drop (n div 10) $scanl f 3 (zip (proposedJumps seed) (acceptOrRejects seed)) We can then compute the root mean square error for this particular sample size. actual = map (/s) xs where xs = map fromIntegral [1 .. numIslands] s = sum xs rmsError n = sqrt$ (/(fromIntegral numIslands)) $sum$ map (^2) $zipWith (-) actual (runMC n) ## A Bernoulli Example We can also code the Metropolis algorithm for the example in which samples are drawn from a Bernoulli distribution. First we can define the likelihood for drawing drawing a specific sequence of 0’s and 1’s from a binomial distribution with parrameter θ. We also define the example sample of data given in Doing Bayesian Data Analysis.  likelihood :: Int -> Int -> Double -> Double likelihood z n theta = theta^z * (1 - theta)^(n - z) myData = [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0] We define a function which defines one step of the Metropolis algorithm. oneStep currPosition (propJump, acceptOrReject) = if acceptOrReject < probAccept then currPosition + propJump else currPosition where probAccept = min 1 (p (currPosition + propJump) / p currPosition) p x | x < 0 = 0 | x > 1 = 0 | otherwise = pAux myData x pAux :: [Double] -> Double -> Double pAux xs position = likelihood z n position where n = length xs z = length$ filter (== 1) xs

Finally we need some proposed jumps; following the example we generate these from a normal distribution ${\cal N}(0, 0.1)$.

normals :: [Double]
normals =  fst $runState (replicateM n (sampleRVar (normal 0 0.1))) (mkStdGen seed) And now we can run the Metropolis algorithm and, for example, find the mean of the posterior. accepteds = drop (n div 10)$
scanl oneStep 0.5 (zip normals (acceptOrRejects seed))

mean = total / count
where
(total, count) = foldl' f (0.0, 0) accepteds
f (s, c) x = (s+x, c+1)

## Functional R

We can now re-write the example in R in a more functional way.
 ## An arbitrary number of jumps to try in Metropolis ## algorithm. trajLength = 11112

 ## The data represented as a vector. myData = c( 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0 ) ## An arbitrary seed presumably used in generating random ## values from the various distributions. set.seed(47405) ## We need some proposed jumps; following the example we ## generate these from a normal distribution. normals = rep (0, trajLength) for (j in 1:trajLength) { normals[j] = rnorm( 1, mean = 0, sd = 0.1 ) } ## We generate proposed jumps which are either one step up ## or one step down. proposedJumps = rep (0, trajLength) for (j in 1:trajLength) { proposedJumps[j] = if ( runif( 1 ) 1 | theta < 0 ] = 0 return ( x ) } prior = function( position, xs ) { n = length( xs ) z = sum ( xs == 1) return ( likelihood ( z, n, position ) ) } ## We define a function which defines one step of the ## Metropolis algorithm. oneStep = function ( currPosition, propJumpAorR ) { proposedJump = propJumpAorR [1] acceptOrReject = propJumpAorR [2] probAccept = min( 1, prior( currPosition + proposedJump , myData ) / prior( currPosition , myData ) ) if ( acceptOrReject < probAccept ) { trajectory = currPosition + proposedJump } else { trajectory = currPosition } return ( trajectory ) } ## Pair together the proposed jumps and the probability ## whether a given proposal will be accepted or rejected. nsAorRs <- list () for (i in 1:trajLength) nsAorRs[[i]] <- c(normals[i], acceptOrRejects[i]) ## Fold (well really scanl) over the pairs returning all the ## intermediate results. trajectory = Reduce( function(a,b) oneStep (a, b), nsAorRs, accumulate=T,init= 0.5 ) ## Drop the values for the burn in period. burnIn = ceiling( .1 * trajLength ) acceptedTraj = trajectory[burnIn:trajLength] 

## Finally return the mean of the posterior. result = mean ( acceptedTraj ) 

May 7, 2012

A few weeks ago, two of my colleagues and I had a birthday in the same week. What are the odds of that? There’s about 25 people in the department in which I work.

We consider every outcome where pairs of people share birthdays but no 3 or more people share birthdays. Then we can use the birthday paradox trick and subtract the proportion of these from 1.

To be absolutely precise we ought to consider outcomes where no one shares a birthday but this is sufficiently small in comparison in the case we are interested in that we can ignore it.

We begin with some imports and a dumb function to calculate multinomial coefficients.


import qualified Data.Map as Map
import System.Random
import Prelude hiding (replicate, zipWith, length)

binomial n 0 = 1
binomial 0 k = 0
binomial n k = n * binomial (n - 1) (k - 1) div k

multinomial xs =
(factorial (sum xs)) div (product [ factorial x | x<-xs ])
where
factorial n = product [1..n]

Let’s take a smaller example. Suppose we have 9 people in a department and we have divided the year up into 11 periods.

We want the cases where there are exactly two birthdays in a period. For example, 2 people might have a birthday in period 1, 7 people might have birthdays in periods 2–8 and no-one has a birthday in periods 9–11. This would give us $\binom{11}{1}\binom{10}{7}$ ways of choosing people. We also need to consider that the 2 people sharing the birthday might share them in period 2, 3, …, 11. There are $\binom{9}{2,1,1,1,1,1,1,1,0}$ ways of this happening. Letting x stand for the total number of ways that 2 people might share exactly 2 birthdays in a given period, we have:

\begin{aligned} x& = \binom{11}{1}\binom{10}{7}\binom{9}{2,1,1,1,1,1,1,1,0} \\ & + \binom{11}{2}\binom{9}{5} \binom{9}{2,2,1,1,1,1,1,0,0} \\ & + \binom{11}{3}\binom{8}{3} \binom{9}{2,2,2,1,1,1,0,0,0} \\ & + \binom{11}{4}\binom{7}{1} \binom{9}{2,2,2,2,1,0,0,0,0}\end{aligned}

In Haskell, we can write this as:

nPeople :: Integer
nPeople = 9
nPeriods :: Integer
nPeriods = 11

x = binomial 11 1 * binomial 10 7 * multinomial [2,1,1,1,1,1,1,1,0] +
binomial 11 2 * binomial 9  5 * multinomial [2,2,1,1,1,1,1,0,0] +
binomial 11 3 * binomial 8  3 * multinomial [2,2,2,1,1,1,0,0,0] +
binomial 11 4 * binomial 7  1 * multinomial [2,2,2,2,1,0,0,0,0]

And therefore we expect this proportion of exactly 2 shared birthdays

y = (fromInteger x) / (fromInteger nPeriods^nPeople)

Generalizing, we first need to be able to generate the desired multinomial coefficient.

coeff x' y' = take y (ns 2) ++ take (x - 2*y) (ns 1) ++ take y (ns 0)
where x    = fromIntegral x'
y    = fromIntegral y'
ns n = n:(ns n)

For example:

*Main> coeff 11 1
[2,1,1,1,1,1,1,1,1,1,0]

Now we can calculate the number of ways of getting exactly 2 birthdays in any single period

terms x y = sum $map (oneTerm x y) [1..(x div 2)] where oneTerm x y n = binomial y n * binomial (y - n) (x - (2 * n)) * multinomial (coeff x n) And finally we can calculate the probability of getting exactly 2 birthdays in any single period. atMostTwoShared x y = fromInteger (terms x y) / fromInteger (y^x) Because it’s easy to double count or forget a particulare case, it’s best to check via Monte Carlo: nSims = 10000 monteCarlo = runs >>= return . Map.map (/(fromInteger nSims)) . counts where counts rs = foldr (\b -> Map.insertWith (+) b 1) Map.empty rs runs = sequence$
take (fromIntegral nSims) $repeat (bDays >>= return . maxOnOneDay) bDays = sequence$
take (fromIntegral nPeople) $repeat$ pickBirthday
maxOnOneDay bs = maximum $Map.elems$
foldr (\b -> Map.insertWith (+) b 1) Map.empty bs
pickBirthday   = getStdRandom (randomR (1,nPeriods))

main = do
ps <- monteCarlo
putStrLn $show ps putStrLn$ show atMostTwoShared nPeople nPeriods Surprisingly the answer for 3 people sharing a birthday in a period where there are 9 people and 11 periods is 43%. The answer we are after is for 25 people and 52 periods (weeks in the year) and this is 50%. ## The Implicit Euler Method April 22, 2012 We can solve our differential equation using the Implicit Euler method which is unconditionally stable. We can also take this opportunity to use the Vector Package rather than Arrays as it has a richer set of combinators and to tidy up the code to make the payoff explicit (thanks to suggestions by Ben Moseley). First let’s get our imports arranged so the code isn’t too cluttered with qualified names. {-# LANGUAGE RecordWildCards #-} import Prelude hiding ( length , scanl , scanr , zip , tail , init , last ) import qualified Data.Vector as V import Data.Vector ( (!) , generate , length , Vector , prescanl , scanl , scanr , zip , tail , init , last , unfoldr ) Define a type synomym for the payoff function and put all the parameters for the trade in one place. type Payoff = Double -> Double data Config = Config { r :: Double, sigma :: Double, t :: Double, m :: Int, -- Space steps n :: Int, -- Time steps xMax :: Double, deltaX :: Double, deltaT :: Double } defaultConf :: Config defaultConf = Config { r = 0.05, sigma = 0.20, t = 3.0, m = 80, n = 800, xMax = 150, deltaX = xMax defaultConf / fromIntegral (m defaultConf), deltaT = t defaultConf / fromIntegral (n defaultConf) } The usual(!) Comonad class: class Comonad c where coreturn :: c a -> a (=>>) :: c a -> (c a -> b) -> c b Instead of using an Array to represent a "time slice" we use a Vector. Notice we can use the Vector library functions to implement the cobind. data PointedArray a = PointedArray Int (Vector a) deriving Show instance Functor PointedArray where fmap f (PointedArray i x) = PointedArray i (fmap f x) instance Comonad PointedArray where coreturn (PointedArray i x) = x!i (PointedArray i x) =>> f = PointedArray i (V.map (f . flip PointedArray x) (generate (length x) id)) We start with our partial differential equation: $\frac{\partial V}{\partial t} + \frac{1}{2} \sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + r S\frac{\partial V}{\partial S} - rV = 0$ Again we can approximate this by a difference equation but in this case we approximate the time derivative by $\frac{z^n_i - z^{n-1}_i}{\Delta t}$. \begin{aligned} \frac{z^{n-1}_i - z^n_i}{\Delta t} + \frac{1}{2} \sigma^2 (i\Delta x)^2\frac{z^n_{i+1} - 2z^n_i + z^n_{i-1}}{\Delta x^2} + r(i\Delta x)\frac{z^n_{i+1} - z^n_{i-1}}{2\Delta x} - rz^n_i = 0\end{aligned} Re-arranging (and not forgetting we are going backwards in time so that n is earlier than n − 1), we obtain an implicit equation for zn in terms zn − 1: \begin{aligned} z^{n-1}_i = A_i z^n_{i-1} + B_i z^n_i + C_i z^n_{i+1}\end{aligned} where \begin{aligned} A_i &= \frac{\Delta t}{2}(ri - \sigma^2 i^2) \\ B_i &= 1 + \Delta t(r + \sigma^2 i^2) \\ C_i &= -\frac{\Delta t}{2}(ri + \sigma^2 i^2)\end{aligned} Thus we have a matrix equation which we need to invert. Fortunately the matrix is tridiagonal so we can use the Thomas Algorithm to invert it. Let’s represent our tridiagonal matrix as a Vector of triples data TriDiMatrix a = TriDiMatrix (Vector (a, a, a)) deriving Show Here’s our tridiagonal matrix. Note that we have encoded two of the boundary conditions in this (the case for S(t, 0) and for S(t, x) as x → ∞). triDi = TriDiMatrix unfoldr h 0
where
h :: Int -> Maybe ((Double, Double, Double), Int)
h k | k == m' + 1 = Nothing
h k | k == m'     = Just (( 0.0,    1.0,   0.0), k + 1)
h k | k == 0      = Just (( 0.0,    1.0,   0.0), k + 1)
h k               = Just ((afn k, bfn k, cfn k), k + 1)

afn i =          deltaT' * (r' * (fromIntegral i) - sigma'^2 * (fromIntegral i)^2) / 2
bfn i = 1      + deltaT' * (r' + sigma'^2 * (fromIntegral i)^2)
cfn i = negate $deltaT' * (r' * (fromIntegral i) + sigma'^2 * (fromIntegral i)^2) / 2 m' = m defaultConf deltaT' = deltaT defaultConf r' = r defaultConf sigma' = sigma defaultConf Next we implement the Thomas Algorithm. solveTriDi takes a (tridiagonal) matrix and a vector and returns the application of the inverse of the matrix to the vector. solveTriDi :: TriDiMatrix Double -> Vector Double -> Vector Double solveTriDi (TriDiMatrix m) d = z where -- Forward sweep c' = prescanl f (let (a1, b1, c1) = m!0 in c1 / b1) (tail m) where f ci' (ai, bi, ci) = ci / (bi - ci'*ai) d' = scanl g d1'$ zip (tail m) (zip c' (tail d))
where g di' ((ai, bi, ci), (ci', di)) = (di - di'*ai) / (bi - ci'*ai)
d1         = d!0
(_, b1, _) = m!0
d1'        = d1 / b1

-- Backward substitution
z = scanr h xn $zip c' (init d') where h (ci', di') xi1 = di' - ci'*xi1 xn = last d' Now set up the boundary condition for our equation which is now a function of the payoff. pricePArrAtT :: Config -> Payoff -> Int -> PointedArray Double pricePArrAtT cfg@(Config {..}) fn i = PointedArray i (V.fromList [ fn (deltaX * (fromIntegral j)) | j <- [0..m] ]) Finally we can price our call option without having to worry about the stability of the solution. callPrices :: Int -> [Double] callPrices i = map coreturn$ iterate (=>> oneStep) \$ pricePArrAtT defaultConf fn i
where
oneStep (PointedArray j x) = (solveTriDi triDi x)!j
fn x                       = max 0 (x - k) -- A call
k                          = 50.0          -- The strike

callPrice i = (callPrices i)!!(n defaultConf - 1)