Introduction
Simple models for e.g. financial option pricing assume that the volatility of an index or a stock is constant, see here for example. However, simple observation of time series show that this is not the case; if it were then the log returns would be white noise
One approach which addresses this, GARCH (Generalised AutoRegressive Conditional Heteroskedasticity), models the evolution of volatility deterministically.
Stochastic volatility models treat the volatility of a return on an asset, such as an option to buy a security, as a Hidden Markov Model (HMM). Typically, the observable data consist of noisy meancorrected returns on an underlying asset at equally spaced time points.
There is evidence that Stochastic Volatility models (Kim, Shephard, and Chib (1998)) offer increased flexibility over the GARCH family, e.g. see Geweke (1994), Fridman and Harris (1998) and Jacquier, Polson, and Rossi (1994). Despite this and judging by the numbers of questions on the R Special Interest Group on Finance mailing list, the use of GARCH in practice far outweighs that of Stochastic Volatility. Reasons cited are the multiplicity of estimation methods for the latter and the lack of packages (but see here for a recent improvement to the paucity of packages).
In their tutorial on particle filtering, Doucet and Johansen (2011) give an example of stochastic volatility. We save this approach for future blog posts and follow Lopes and Polson and the excellent lecture notes by Hedibert Lopes.
Here’s the model.
We wish to estimate and . To do this via a Gibbs sampler we need to sample from
Haskell Preamble
> {# OPTIONS_GHC Wall #}
> {# OPTIONS_GHC fnowarnnameshadowing #}
> {# OPTIONS_GHC fnowarntypedefaults #}
> {# OPTIONS_GHC fnowarnunuseddobind #}
> {# OPTIONS_GHC fnowarnmissingmethods #}
> {# OPTIONS_GHC fnowarnorphans #}
> {# LANGUAGE RecursiveDo #}
> {# LANGUAGE ExplicitForAll #}
> {# LANGUAGE TypeOperators #}
> {# LANGUAGE TypeFamilies #}
> {# LANGUAGE ScopedTypeVariables #}
> {# LANGUAGE DataKinds #}
> {# LANGUAGE FlexibleContexts #}
> module StochVol (
> bigM
> , bigM0
> , runMC
> , ys
> , vols
> , expectationTau2
> , varianceTau2
> ) where
> import Numeric.LinearAlgebra.HMatrix hiding ( (===), (), Element,
> (<>), (#>), inv )
> import qualified Numeric.LinearAlgebra.Static as S
> import Numeric.LinearAlgebra.Static ( (<>) )
> import GHC.TypeLits
> import Data.Proxy
> import Data.Maybe ( fromJust )
> import Data.Random
> import Data.Random.Source.PureMT
> import Control.Monad.Fix
> import Control.Monad.State.Lazy
> import Control.Monad.Writer hiding ( (<>) )
> import Control.Monad.Loops
> import Control.Applicative
> import qualified Data.Vector as V
> inv :: (KnownNat n, (1 <=? n) ~ 'True) => S.Sq n > S.Sq n
> inv m = fromJust $ S.linSolve m S.eye
> infixr 8 #>
> (#>) :: (KnownNat m, KnownNat n) => S.L m n > S.R n > S.R m
> (#>) = (S.#>)
> type StatsM a = RVarT (Writer [((Double, Double), Double)]) a
> () :: (KnownNat ((+) r1 r2), KnownNat r2, KnownNat c, KnownNat r1) =>
> S.L c r1 > S.L c r2 > S.L c ((+) r1 r2)
> () = (S.¦)
Marginal Distribution for Parameters
Let us take a prior that is standard for linear regression
where and use standard results for linear regression to obtain the required marginal distribution.
That the prior is Normal Inverse Gamma () means
Standard Bayesian analysis for regression tells us that the (conditional) posterior distribution for
where the are IID normal with variance is given by
with
We can rewrite the above recursively. We do not need to for this blog article but it will be required in any future blog article which uses Sequential Monte Carlo techniques.
Furthermore
so we can write
and
Specialising
In the case of our model we can specialise the nonrecursive equations as
Let’s rewrite the notation to fit our model.
Sample from
We can implement this in Haskell as
> sampleParms ::
> forall n m .
> (KnownNat n, (1 <=? n) ~ 'True) =>
> S.R n > S.L n 2 > S.R 2 > S.Sq 2 > Double > Double >
> RVarT m (S.R 2, Double)
> sampleParms y bigX theta_0 bigLambda_0 a_0 s_02 = do
> let n = natVal (Proxy :: Proxy n)
> a_n = 0.5 * (a_0 + fromIntegral n)
> bigLambda_n = bigLambda_0 + (tr bigX) <> bigX
> invBigLambda_n = inv bigLambda_n
> theta_n = invBigLambda_n #> ((tr bigX) #> y + (tr bigLambda_0) #> theta_0)
> b_0 = 0.5 * a_0 * s_02
> b_n = b_0 +
> 0.5 * (S.extract (S.row y <> S.col y)!0!0) +
> 0.5 * (S.extract (S.row theta_0 <> bigLambda_0 <> S.col theta_0)!0!0) 
> 0.5 * (S.extract (S.row theta_n <> bigLambda_n <> S.col theta_n)!0!0)
> g < rvarT (Gamma a_n (recip b_n))
> let s2 = recip g
> invBigLambda_n' = m <> invBigLambda_n
> where
> m = S.diag $ S.vector (replicate 2 s2)
> m1 < rvarT StdNormal
> m2 < rvarT StdNormal
> let theta_n' :: S.R 2
> theta_n' = theta_n + S.chol (S.sym invBigLambda_n') #> (S.vector [m1, m2])
> return (theta_n', s2)
Marginal Distribution for State
Marginal for
Using a standard result about conjugate priors and since we have
we can deduce
where
> sampleH0 :: Double >
> Double >
> V.Vector Double >
> Double >
> Double >
> Double >
> RVarT m Double
> sampleH0 iC0 iC0m0 hs mu phi tau2 = do
> let var = recip $ (iC0 + phi^2 / tau2)
> mean = var * (iC0m0 + phi * ((hs V.! 0)  mu) / tau2)
> rvarT (Normal mean (sqrt var))
Marginal for
From the state equation, we have
We also have
Adding the two expressions together gives
Since are standard normal, then conditional on and is normally distributed, and
We also have
Writing
by Bayes’ Theorem we have
where is the probability density function of a normal distribution.
We can sample from this using Metropolis

For each , sample from where is the tuning variance.

For each , compute the acceptance probability
 For each , compute a new value of
> metropolis :: V.Vector Double >
> Double >
> Double >
> Double >
> Double >
> V.Vector Double >
> Double >
> RVarT m (V.Vector Double)
> metropolis ys mu phi tau2 h0 hs vh = do
> let eta2s = V.replicate (n1) (tau2 / (1 + phi^2)) `V.snoc` tau2
> etas = V.map sqrt eta2s
> coef1 = (1  phi) / (1 + phi^2) * mu
> coef2 = phi / (1 + phi^2)
> mu_n = mu + phi * (hs V.! (n1))
> mu_1 = coef1 + coef2 * ((hs V.! 1) + h0)
> innerMus = V.zipWith (\hp1 hm1 > coef1 + coef2 * (hp1 + hm1)) (V.tail (V.tail hs)) hs
> mus = mu_1 `V.cons` innerMus `V.snoc` mu_n
> hs' < V.mapM (\mu > rvarT (Normal mu vh)) hs
> let num1s = V.zipWith3 (\mu eta h > logPdf (Normal mu eta) h) mus etas hs'
> num2s = V.zipWith (\y h > logPdf (Normal 0.0 (exp (0.5 * h))) y) ys hs'
> nums = V.zipWith (+) num1s num2s
> den1s = V.zipWith3 (\mu eta h > logPdf (Normal mu eta) h) mus etas hs
> den2s = V.zipWith (\y h > logPdf (Normal 0.0 (exp (0.5 * h))) y) ys hs
> dens = V.zipWith (+) den1s den2s
> us < V.replicate n <$> rvarT StdUniform
> let ls = V.zipWith (\n d > min 0.0 (n  d)) nums dens
> return $ V.zipWith4 (\u l h h' > if log u < l then h' else h) us ls hs hs'
Markov Chain Monte Carlo
Now we can write down a single step for our Gibbs sampler, sampling from each marginal in turn.
> singleStep :: Double > V.Vector Double >
> (Double, Double, Double, Double, V.Vector Double) >
> StatsM (Double, Double, Double, Double, V.Vector Double)
> singleStep vh y (mu, phi, tau2, h0, h) = do
> lift $ tell [((mu, phi),tau2)]
> hNew < metropolis y mu phi tau2 h0 h vh
> h0New < sampleH0 iC0 iC0m0 hNew mu phi tau2
> let bigX' = (S.col $ S.vector $ replicate n 1.0)
> 
> (S.col $ S.vector $ V.toList $ h0New `V.cons` V.init hNew)
> bigX = bigX' `asTypeOf` (snd $ valAndType nT)
> newParms < sampleParms (S.vector $ V.toList h) bigX (S.vector [mu0, phi0]) invBigV0 nu0 s02
> return ( (S.extract (fst newParms))!0
> , (S.extract (fst newParms))!1
> , snd newParms
> , h0New
> , hNew
> )
Testing
Let’s create some test data.
> mu', phi', tau2', tau' :: Double
> mu' = 0.00645
> phi' = 0.99
> tau2' = 0.15^2
> tau' = sqrt tau2'
We need to create a statically typed matrix with one dimension the same size as the data so we tie the data size value to the required type.
> nT :: Proxy 500
> nT = Proxy
> valAndType :: KnownNat n => Proxy n > (Int, S.L n 2)
> valAndType x = (fromIntegral $ natVal x, undefined)
> n :: Int
> n = fst $ valAndType nT
Arbitrarily let us start the process at
> h0 :: Double
> h0 = 0.0
We define the process as a stream (aka corecursively) using the Haskell recursive do construct. It is not necessary to do this but streams are a natural way to think of stochastic processes.
> hs, vols, sds, ys :: V.Vector Double
> hs = V.fromList $ take n $ fst $ runState hsAux (pureMT 1)
> where
> hsAux :: (MonadFix m, MonadRandom m) => m [Double]
> hsAux = mdo { x0 < sample (Normal (mu' + phi' * h0) tau')
> ; xs < mapM (\x > sample (Normal (mu' + phi' * x) tau')) (x0:xs)
> ; return xs
> }
> vols = V.map exp hs
We can plot the volatility (which we cannot observe directly).
And we can plot the log returns.
> sds = V.map sqrt vols
> ys = fst $ runState ysAux (pureMT 2)
> where
> ysAux = V.mapM (\sd > sample (Normal 0.0 sd)) sds
We start with a vague prior for
> m0, c0 :: Double
> m0 = 0.0
> c0 = 100.0
For convenience
> iC0, iC0m0 :: Double
> iC0 = recip c0
> iC0m0 = iC0 * m0
Rather than really sample from priors for and let us cheat and assume we sampled the simulated values!
> mu0, phi0, tau20 :: Double
> mu0 = 0.00645
> phi0 = 0.99
> tau20 = 0.15^2
But that we are still very uncertain about them
> bigV0, invBigV0 :: S.Sq 2
> bigV0 = S.diag $ S.fromList [100.0, 100.0]
> invBigV0 = inv bigV0
> nu0, s02 :: Double
> nu0 = 10.0
> s02 = (nu0  2) / nu0 * tau20
Note that for the inverse gamma this gives
> expectationTau2, varianceTau2 :: Double
> expectationTau2 = (nu0 * s02 / 2) / ((nu0 / 2)  1)
> varianceTau2 = (nu0 * s02 / 2)^2 / (((nu0 / 2)  1)^2 * ((nu0 / 2)  2))
ghci> expectationTau2
2.25e2
ghci> varianceTau2
1.6874999999999998e4
Running the Markov Chain
Tuning parameter
> vh :: Double
> vh = 0.1
The burnin and sample sizes may be too low for actual estimation but will suffice for a demonstration.
> bigM, bigM0 :: Int
> bigM0 = 2000
> bigM = 2000
> multiStep :: StatsM (Double, Double, Double, Double, V.Vector Double)
> multiStep = iterateM_ (singleStep vh ys) (mu0, phi0, tau20, h0, vols)
> runMC :: [((Double, Double), Double)]
> runMC = take bigM $ drop bigM0 $
> execWriter (evalStateT (sample multiStep) (pureMT 42))
And now we can look at the distributions of our estimates
Bibliography
Doucet, Arnaud, and Adam M Johansen. 2011. “A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later.” In Handbook of Nonlinear Filtering. Oxford, UK: Oxford University Press.
Fridman, Moshe, and Lawrence Harris. 1998. “A Maximum Likelihood Approach for NonGaussian Stochastic Volatility Models.” Journal of Business & Economic Statistics 16 (3): 284–91.
Geweke, John. 1994. “Bayesian Comparison of Econometric Models.”
Jacquier, Eric, Nicholas G. Polson, and Peter E. Rossi. 1994. “Bayesian Analysis of Stochastic Volatility Models.”
Kim, Sangjoon, Neil Shephard, and Siddhartha Chib. 1998. “Stochastic Volatility: Likelihood Inference and Comparison with ARCH Models.” Review of Economic Studies 65 (3): 361–93. http://ideas.repec.org/a/bla/restud/v65y1998i3p36193.html.