Girsanov’s Theorem


We previously used importance sampling in the case where we did not have a sampler available for the distribution from which we wished to sample. There is an even more compelling case for using importance sampling.

Suppose we wish to estimate the probability of a rare event. For example, suppose we wish to estimate \mathbb{P}(X > 5) where X \sim {\mathcal{N}}(0,1). In this case, we can look up the answer \mathbb{P}(X > 5) \approx 2.86710^{-7}. But suppose we couldn’t look up the answer. One strategy that might occur to us is to sample and then estimate the probability by counting the number of times out of the total that the sample was bigger than 5. The flaw in this is obvious but let’s try it anyway.

> module Girsanov where
> import qualified Data.Vector as V
> import Data.Random.Source.PureMT
> import Data.Random
> import Control.Monad.State
> import Data.Histogram.Fill
> import Data.Histogram.Generic ( Histogram )
> import Data.Number.Erf
> import Data.List ( transpose )
> samples :: (Foldable f, MonadRandom m) =>
>                     (Int -> RVar Double -> RVar (f Double)) ->
>                     Int ->
>                     m (f Double)
> samples repM n = sample $ repM n $ stdNormal
> biggerThan5 :: Int
> biggerThan5 = length (evalState xs (pureMT 42))
>   where
>     xs :: MonadRandom m => m [Double]
>     xs = liftM (filter (>= 5.0)) $ samples replicateM 100000

As we might have expected, even if we draw 100,000 samples, we estimate this probability quite poorly.

ghci> biggerThan5

Using importance sampling we can do a lot better.

Let \xi be a normally distributed random variable with zero mean and unit variance under the Lebesgue measure \mathbb{P}. As usual we can then define a new probability measure, the law of \xi, by

\displaystyle   \begin{aligned}  \mathbb{P}_\xi((-\infty, b])  &= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^b e^{-x^2/2}\,\mathrm{d}x  \end{aligned}


\displaystyle   \begin{aligned}  \mathbb{E}_\xi(f) &= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^\infty f(x) e^{-x^2/2}\,\mathrm{d}x \\  &= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^\infty f(x) e^{a^2/2}e^{-a x}e^{-(x-a)^2/2}\,\mathrm{d}x \\  &= \mathbb{E}_{\xi + a}(fg) \\  &= \mathbb{\tilde{E}}_{\xi + a}(f)  \end{aligned}

where we have defined

\displaystyle   g(x) \triangleq e^{a^2/2}e^{-a x}  \quad \mathrm{and} \quad  \mathbb{\tilde{P}}((-\infty, b]) \triangleq \int_{-\infty}^b g(x)\,\mathrm{d}x

Thus we can estimate \mathbb{P}(X > 5) either by sampling from a normal distribution with mean 0 and counting the number of samples that are above 5 or we can sample from a normal distribution with mean 5 and calculating the appropriately weighted mean

\displaystyle   \frac{1}{n}\sum_{i=1}^n \mathbb{I}_{\{x > 5\}}g(y)

Let’s try this out.

> biggerThan5' :: Double
> biggerThan5' = sum (evalState xs (pureMT 42)) / (fromIntegral n)
>   where
>     xs :: MonadRandom m => m [Double]
>     xs = liftM (map g) $
>          liftM (filter (>= 5.0)) $
>          liftM (map (+5)) $
>          samples replicateM n
>     g x = exp $ (5^2 / 2) - 5 * x
>     n = 100000

And now we get quite a good estimate.

ghci> biggerThan5'

Random Paths

The probability of another rare event we might wish to estimate is that of Brownian Motion crossing a boundary. For example, what is the probability of Browian Motion crossing the line y = 3.5? Let’s try sampling 100 paths (we restrict the number so the chart is still readable).

> epsilons :: (Foldable f, MonadRandom m) =>
>                     (Int -> RVar Double -> RVar (f Double)) ->
>                     Double ->
>                     Int ->
>                     m (f Double)
> epsilons repM deltaT n = sample $ repM n $ rvar (Normal 0.0 (sqrt deltaT))
> bM0to1 :: Foldable f =>
>           ((Double -> Double -> Double) -> Double -> f Double -> f Double)
>           -> (Int -> RVar Double -> RVar (f Double))
>           -> Int
>           -> Int
>           -> f Double
> bM0to1 scan repM seed n =
>   scan (+) 0.0 $
>   evalState (epsilons repM (recip $ fromIntegral n) n) (pureMT (fromIntegral seed))

We can see by eye in the chart below that again we do quite poorly.

We know that \mathbb{P}(T_a \leq t) = 2(1 - \Phi (a / \sqrt{t})) where T_a = \inf \{t : W_t = a\}.

> p :: Double -> Double -> Double
> p a t = 2 * (1 - normcdf (a / sqrt t))
ghci> p 1.0 1.0

ghci> p 2.0 1.0

ghci> p 3.0 1.0

But what if we didn’t know this formula? Define

\displaystyle   N(\omega) \triangleq  \begin{cases}  1 & \text{if } \sup_{0 \leq t \leq 1}\tilde W_t \geq a \\  0 & \text{if } \sup_{0 \leq t \leq 1}\tilde W_t < a \\  \end{cases}

where \mathbb{Q} is the measure which makes \tilde W_t Brownian Motion.

We can estimate the expectation of N

\displaystyle   \hat p_{\mathbb{Q}} = \frac{1}{M}\sum_{i=1}^H n_i

where n_i is 1 if Brownian Motion hits the barrier and 0 otherwise and M is the total number of simulations. We know from visual inspection that this gives poor results but let us try some calculations anyway.

> n = 500
> m = 10000
> supAbove :: Double -> Double
> supAbove a = fromIntegral count / fromIntegral n
>   where
>     count = length $
>             filter (>= a) $
>             map (\seed -> maximum $ bM0to1 scanl replicateM seed m) [0..n - 1]
> bM0to1WithDrift seed mu n =
>   zipWith (\m x -> x + mu * m * deltaT) [0..] $
>   bM0to1 scanl replicateM seed n
>     where
>       deltaT = recip $ fromIntegral n
ghci> supAbove 1.0

ghci> supAbove 2.0

ghci> supAbove 3.0

As expected for a rare event we get an estimate of 0.

Fortunately we can use importance sampling for paths. If we take \mu(\omega, t) = a where a is a constant in Girsanov’s Theorem below then we know that \tilde W_t = W_t + \int_0^t a \,\mathrm{d}s = W_t + at is \mathbb{Q}-Brownian Motion.

We observe that

\displaystyle   \begin{aligned}  \mathbb{Q}N &= \mathbb{P}\bigg(N\frac{\mathrm{d} \mathbb{Q}}{\mathrm{d} \mathbb{P}}\bigg) \\  &=  \mathbb{P}\Bigg[N  \exp \Bigg(-\int_0^1  \mu(\omega,t) \,\mathrm{d}W_t - \frac{1}{2} \int_0^1 \mu^2(\omega, t) \,\mathrm{d} t\Bigg)  \Bigg] \\  &=  \mathbb{P}\Bigg[N  \exp \Bigg(-aW_1 - \frac{1}{2} a^2\Bigg)  \Bigg]  \end{aligned}

So we can also estimate the expectation of N under \mathbb{P} as

\displaystyle   \hat p_{\mathbb{P}} = \frac{1}{M}\sum_{i=1}^H n_i\exp{\bigg(-aw^{(1)}_i - \frac{a^2}{2}\bigg)}

where n_i is now 1 if Brownian Motion with the specified drift hits the barrier and 0 otherwise, and w^{(1)}_i is Brownian Motion sampled at t=1.

We can see from the chart below that this is going to be better at hitting the required barrier.

Let’s do some calculations.

> supAbove' a = (sum $ zipWith (*) ns ws) / fromIntegral n
>   where
>     deltaT = recip $ fromIntegral m
>     uss = map (\seed -> bM0to1 scanl replicateM seed m) [0..n - 1]
>     ys = map last uss
>     ws = map (\x -> exp (-a * x - 0.5 * a^2)) ys
>     vss = map (zipWith (\m x -> x + a * m * deltaT) [0..]) uss
>     sups = map maximum vss
>     ns = map fromIntegral $ map fromEnum $ map (>=a) sups
ghci> supAbove' 1.0

ghci> supAbove' 2.0

ghci> supAbove' 3.0

The reader is invited to try the above estimates with 1,000 samples per path to see that even with this respectable number, the calculation goes awry.

In General

If we have a probability space (\Omega, {\mathcal{F}}, \mathbb{P}) and a non-negative random variable Z with \mathbb{E}Z = 1 then we can define a new probability measure \mathbb{Q} on the same \sigma-algebra by

\displaystyle   \mathbb{Q} A \triangleq \int_A Z \,\mathrm{d} \mathbb{P}

For any two probability measures when such a Z exists, it is called the Radon-Nikodym derivative of \mathbb{Q} with respect to \mathbb{P} and denoted \frac{\mathrm{d} \mathbb{Q}}{\mathrm{d} \mathbb{P}}

Given that we managed to shift a Normal Distribution with non-zero mean in one measure to a Normal Distribution with another mean in another measure by producing the Radon-Nikodym derivative, might it be possible to shift, Brownian Motion with a drift under a one probability measure to be pure Brownian Motion under another probability measure by producing the Radon-Nikodym derivative? The answer is yes as Girsanov’s theorem below shows.

Girsanov’s Theorem

Let W_t be Brownian Motion on a probability space (\Omega, {\mathcal{F}}, \mathbb{P}) and let \{{\mathcal{F}}_t\}_{t \in [0,T]} be a filtration for this Brownian Motion and let \mu(\omega, t) be an adapted process such that the Novikov Sufficiency Condition holds

\displaystyle   \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^T \mu^2(s, \omega) \,\mathrm{d}s\bigg)}\bigg] = K < \infty

then there exists a probability measure \mathbb{Q} such that

  • \mathbb{Q} is equivalent to \mathbb{P}, that is, \mathbb{Q}(A) = 0 \iff \mathbb{P}(A) = 0.

  • \displaystyle {\frac{\mathrm{d}\mathbb{Q}}{\mathrm{d}\mathbb{P}} = \exp \Bigg(-\int_0^T \mu(\omega,t) \,\mathrm{d}W_t - \frac{1}{2} \int_0^T \mu^2(\omega, t) \,\mathrm{d} t\Bigg)}.

  • \tilde W_t = W_t + \int_0^t \mu(\omega, t) \,\mathrm{d}s is Brownian Motion on the probabiity space (\Omega, {\mathcal{F}}, \mathbb{Q}) also with the filtration \{\mathcal{F}_t\}_{t \in [0,T]}.

In order to prove Girsanov’s Theorem, we need a condition which allows to infer that M_t(\mu) is a strict martingale. One such useful condition to which we have already alluded is the Novikov Sufficiency Condition.


Define \mathbb{Q} by

\displaystyle   \mathbb{Q}(A) = \mathbb{P}(1_A M_T) \quad \mathrm{where} \quad  M_t(\mu) = \exp{\bigg(\int_0^t - \mu(t, \omega) \,\mathrm{d}W_s  -                        \frac{1}{2}\int_0^t \mu^2(t, \omega) \,\mathrm{d}s\bigg)}

Then, temporarily overloading the notation and writing \mathbb{P} for expectation under \mathbb{P}, and applying the Novikov Sufficiency Condition to f(s) - \mu(\omega ,s), we have

\displaystyle   \begin{aligned}  \mathbb{Q}\bigg[\exp{\int_0^T f(s) \,\mathrm{d}X_s}\bigg] &=  \mathbb{Q}\bigg[\exp{\int_0^T f(s) \,\mathrm{d}W_s + \int_0^T \mu(\omega, s) \,\mathrm{d}s}\bigg] \\  &=  \mathbb{P}\bigg[\exp{\bigg(  \int_0^T \big(f(s) - \mu(\omega, s)\big)\,\mathrm{d}W_s +  \int_0^T f(s)\mu(\omega, s)\,\mathrm{d}s -  \frac{1}{2}\int_0^T \mu^2(\omega ,s) \,\mathrm{d}s  \bigg)}\bigg] \\  &=  \mathbb{P}\bigg[\exp{\bigg(  \int_0^T \big(f(s) - \mu(\omega, s)\big)\,\mathrm{d}W_s -  \frac{1}{2}\int_0^T \big(f(s) - \mu(\omega ,s)\big)^2 \,\mathrm{d}s +  \frac{1}{2}\int_0^T f^2(s) \,\mathrm{d}s  \bigg)}\bigg] \\  &=  \frac{1}{2}\int_0^T f^2(s) \,\mathrm{d}s  \,  \mathbb{P}\bigg[\exp{\bigg(  \int_0^T \big(f(s) - \mu(\omega, s)\big)\,\mathrm{d}W_s -  \frac{1}{2}\int_0^T \big(f(s) - \mu(\omega ,s)\big)^2 \,\mathrm{d}s  \bigg)}\bigg] \\  &=  \frac{1}{2}\int_0^T f^2(s) \,\mathrm{d}s  \end{aligned}

From whence we see that

\displaystyle   \mathbb{Q}\big(e^{i \zeta (X_t - X_s)}\big) = e^{-\frac{1}{2} \zeta^2 (t - s)}

And since this characterizes Brownian Motion, we are done.


The Novikov Sufficiency Condition

The Novikov Sufficiency Condition Statement

Let \mu \in {\cal{L}}^2_{\mathrm{LOC}}[0,T] and further let it satisfy the Novikov condition

\displaystyle   \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^T \mu^2(s, \omega) \,\mathrm{d}s\bigg)}\bigg] = K < \infty

then the process defined by

\displaystyle   M_t(\mu) = \exp{\bigg(\int_0^t \mu(t, \omega) \,\mathrm{d}W_s  -                        \frac{1}{2}\int_0^t \mu^2(t, \omega) \,\mathrm{d}s\bigg)}

is a strict martingale.

Before we prove this, we need two lemmas.

Lemma 1

Let M_t for t \in [0,t] be a non-negative local martingale then M_t is a super-martingale and if further \mathbb{E}M_T = \mathbb{E}M_0 then M_t is a strict martingale.


Let \{\tau_n\}_{n \in \mathbb{N}} be a localizing sequence for M_t then for 0 < s < t < T and using Fatou’s lemma and the fact that the stopped process is a strict martingale

\displaystyle   \mathbb{E}(M_t \,|\, {\mathcal{F}_s}) =  \mathbb{E}(\liminf_{n \rightarrow \infty} M_{t \land \tau_m} \,|\, {\mathcal{F}_s}) \leq  \liminf_{n \rightarrow \infty} \mathbb{E}(M_{t \land \tau_m} \,|\, {\mathcal{F}_s}) =  \liminf_{n \rightarrow \infty} M_{s \land \tau_m} = M_s

Thus M_t is a super-martingale and therefore

\displaystyle   \mathbb{E}M_T \leq \mathbb{E}M_t \leq \mathbb{E}M_s \leq \mathbb{E}M_0

By assumption we have \mathbb{E}M_T \leq \mathbb{E}M_0 thus M_t is a strict martingale.


Lemma 2

Let M_t be a non-negative local martingale. If \{\tau_n\}_{n \in \mathbb{N}} is a localizing sequence such that \sup_n \|M_{T \land \tau_n}\|_p < \infty for some p > 1 then M_t is a strict martingale.


\displaystyle   \mathbb{E}(|M_T - M_{T \land \tau_n}|) \leq  \mathbb{E}(|M_T - r \land M_T) +  \mathbb{E}(|r \land M_T - r \land M_{T \land \tau_n}|) +  \mathbb{E}(M_{T \land \tau_n} - r \land M_{T \land \tau_n})

By the super-martingale property \mathbb{E}(M_T) \leq \mathbb{E}(M_0) < \infty and thus by dominated convergence we have that

\displaystyle   \lim_{r \rightarrow \infty} \mathbb{E}(r \land M_T) = \mathbb{E}(M_T) \quad \mathrm{and} \quad  \lim_{r \rightarrow \infty}\lim_{n \rightarrow \infty}\mathbb{E}(|r \land M_T - r \land M_{T \land \tau_n}|) = 0

We also have that

\displaystyle   \begin{aligned}  \mathbb{E}(M_{T \land \tau_n} - r \land M_{T \land \tau_n}) &=  \mathbb{E}((M_{T \land \tau_n} - r \land M_{T \land \tau_n}){I}_{(M_{T \land \tau_n} > r)}) +  \mathbb{E}((M_{T \land \tau_n} - r \land M_{T \land \tau_n}){I}_{(M_{T \land \tau_n} \leq r)}) \\  &= \mathbb{E}((M_{T \land \tau_n} - r \land M_{T \land \tau_n}){I}_{(M_{T \land \tau_n} > r)}) \\  &= \mathbb{E}(M_{T \land \tau_n}{I}_{(M_{T \land \tau_n} > r)}) - r\mathbb{P}({M_{T \land \tau_n} > r})  \end{aligned}

By Chebyshev’s inequality (see note (2) below), we have

\displaystyle   r\mathbb{P}({M_{T \land \tau_n} > r}) \leq \frac{r\mathbb{E}|X|^p}{r^p} \leq  \frac{\sup_n{\mathbb{E}(M_{T \land \tau_n})^p}}{r^{p-1}}

Taking limits first over n \rightarrow \infty and then over r \rightarrow \infty we see that

\displaystyle   \lim_{r \rightarrow \infty}\lim_{n \rightarrow \infty} r\mathbb{P}({M_{T \land \tau_n} > r}) \rightarrow 0

For 0 \leq r \leq x and p > 1 we have x \leq r^{1-p}x^p. Thus

\displaystyle   \mathbb{E}(M_{T \land \tau_n}{I}_{(M_{T \land \tau_n} > r)}) \leq  r^{1-p}\mathbb{E}(M_{T \land \tau_n}^p{I}_{(M_{T \land \tau_n} > r)}) \leq  r^{1-p}\sup_n(M_{T \land \tau_n}^p)

Again taking limits over n \rightarrow \infty and then over r \rightarrow \infty we have

\displaystyle   \lim_{r \rightarrow \infty}\lim_{n \rightarrow \infty} \mathbb{E}(M_{T \land \tau_n}{I}_{(M_{T \land \tau_n} > r)}) \rightarrow 0

These two conclusions imply

\displaystyle   \lim_{r \rightarrow \infty}\lim_{n \rightarrow \infty} \mathbb{E}(M_{T \land \tau_n} - r \land M_{T \land \tau_n}) \rightarrow 0

We can therefore conclude (since M_{T \land \tau_n} is a martingale)

\displaystyle   \mathbb{E}(M_T) = \lim_{n \rightarrow \infty}\mathbb{E}(M_{T \land \tau_n}) =  \mathbb{E}(M_0)

Thus by the preceeding lemma M_t is a strict as well as a local martingale.


The Novikov Sufficiency Condition Proof

Step 1

First we note that M_t(\lambda\mu) is a local martingale for 0 < \lambda < 1. Let us show that it is a strict martingale. We can do this if for any localizing sequence \{\tau_n\}_{n \in \mathbb{N}} we can show

\displaystyle   \sup_n\mathbb{E}(M_{T \land \tau_n}(\lambda\mu))^p < \infty

using the preceeding lemma where p > 1.

We note that

\displaystyle   \begin{aligned}  M_t(\lambda\mu) &=  \exp{\bigg(\int^t_0 \lambda\mu(\omega, s)\,\mathrm{d}W_s -  \frac{1}{2}\int^t_0 \lambda^2\mu^2(\omega, s)\,\mathrm{d}s\bigg)} \\  &= {(M_t(\mu))}^{\lambda^2}\exp{\bigg((\lambda - \lambda^2)\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}  \end{aligned}

Now apply Hölder’s inequality with conjugates ({p\lambda^2})^{-1} and ({1 - p\lambda^2})^{-1} where p is chosen to ensure that the conjugates are both strictly greater than 1 (otherwise we cannot apply the inequality).

\displaystyle   \begin{aligned}  \mathbb{E}((M_t(\lambda\mu))^p)  &=  \mathbb{E}\bigg[{(M_t(\mu))}^{p\lambda^2}\exp{\bigg(p(\lambda - \lambda^2)\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg] \\  &\le  \bigg|\bigg|{M_t(\mu)}^{p\lambda^2}\bigg|\bigg|_{p\lambda^2}  \bigg|\bigg|\exp{\bigg(p(\lambda - \lambda^2)\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg|\bigg|_{1 - p\lambda^2} \\  &=  \mathbb{E}{\bigg[M_t(\mu)}\bigg]^{p\lambda^2}  \mathbb{E}\bigg[\exp{\bigg(p\frac{\lambda - \lambda^2}{1 - p\lambda^2}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg]^{1 - p\lambda^2}  \end{aligned}

Now let us choose

\displaystyle   p\frac{\lambda - \lambda^2}{1 - p\lambda^2} = \frac{1}{2}


\displaystyle   \begin{aligned}  2p(\lambda - \lambda^2) &= 1 - p\lambda^2 \\  p & = \frac{1}{2(\lambda - \lambda^2) + \lambda^2} \\  p &= \frac{1}{(2 - \lambda)\lambda}  \end{aligned}

In order to apply Hölder’s inequality we need to check that (p\lambda^2)^{-1} > 1 and that (1 - p\lambda^2)^{-1} > 1 but this amounts to checking that p\lambda^2 > 0 and that 1 > \lambda. We also need to check that p > 0 but this amounts to checking that (2 - \lambda)\lambda < 1 for 0 < \lambda < 1 and this is easily checked to be true.

Re-writing the above inequality with this value of p we have

\displaystyle   \begin{aligned}  \mathbb{E}((M_t(\lambda\mu))^p)  &\le  \mathbb{E}{\bigg[M_t(\mu)}\bigg]^{p\lambda^2}  \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg]^{1 - p\lambda^2}  \end{aligned}

By the first lemma, since M_t(\mu) is a non-negative local martingale, it is also a supermartingale. Furthermore \mathbb{E}(M_0(\mu)) = 1. Thus

\displaystyle   \mathbb{E}{\bigg[M_t(\mu)}\bigg]^{p\lambda^2} \leq 1

and therefore

\displaystyle   \begin{aligned}  \mathbb{E}((M_t(\lambda\mu))^p)  &\le  \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg]^{1 - p\lambda^2}  \end{aligned}

Step 2

Recall we have

\displaystyle   {M_t} =  \exp\bigg(  \int_0^t \mu(\omega ,s)\,\mathrm{d}W_s - \frac{1}{2}\int_0^t \mu(\omega ,s)\,\mathrm{d}s  \bigg)

Taking logs gives

\displaystyle   \log{M_t} =  \int_0^t \mu(\omega ,s)\,\mathrm{d}W_s - \frac{1}{2}\int_0^t \mu(\omega ,s)^2\,\mathrm{d}s

or in diferential form

\displaystyle   \mathrm{d}(\log{M_t}) =  \mu(\omega ,t)\,\mathrm{d}W_t - \frac{1}{2}\mu(\omega ,t)^2\,\mathrm{d}t

We can also apply Itô’s rule to \log{M_t}

\displaystyle   \begin{aligned}  \mathrm{d}(\log{M_t})  &= \frac{1}{M_t}\,\mathrm{d}M_t   - \frac{1}{2}\frac{1}{M_t^2}\,\mathrm{d}[M]_t \\  \end{aligned}

where [\ldots] denotes the quadratic variation of a stochastic process.

Comparing terms gives the stochastic differential equation

\displaystyle   \mathrm{d}M_t = M_t\mu(\omega,t)\,\mathrm{d}W_t

In integral form this can also be written as

\displaystyle   M_t = 1 + \int_0^t M_s\mu(\omega, s)\,\mathrm{d}W_s

Thus M_t is a local martingale (it is defined by a stochastic differential equation) and by the first lemma it is a supermaringale. Hence \mathbb{E}M_t \leq 1.

Next we note that

\displaystyle   \exp{\bigg(\frac{1}{2}\int_0^t \mu(\omega, t)\bigg)} =  \exp{\bigg(\frac{1}{2}\int_0^t \mu(\omega, t) -       \frac{1}{4}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s\bigg)}  \exp{\bigg(\frac{1}{4}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s\bigg)}

to which we can apply Hölder’s inequality with conjugates p = q = 2 to obtain

\displaystyle   \begin{aligned}  \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^t \mu(\omega, t)\bigg)}\bigg] &=  \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^t \mu(\omega, t) -                             \frac{1}{4}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s                       \bigg)}                  \exp{\bigg(\frac{1}{4}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s                       \bigg)}\bigg] \\  & \leq  \sqrt{\mathbb{E}\bigg[\exp{\bigg(\int_0^t \mu(\omega, t) -                             \frac{1}{2}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s                       \bigg)}\bigg]}  \sqrt{\mathbb{E}\exp{\bigg(\frac{1}{2}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s                       \bigg)}\bigg]}  \end{aligned}

Applying the supermartingale inequality then gives

\displaystyle   \begin{aligned}  \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^t \mu(\omega, t)\bigg)}\bigg]  & \leq  \sqrt{\mathbb{E}\exp{\bigg(\frac{1}{2}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s                       \bigg)}\bigg]}  \end{aligned}

Step 3

Now we can apply the result in Step 2 to the result in Step 1.

\displaystyle   \begin{aligned}  \mathbb{E}((M_t(\lambda\mu))^p)  &\le  \mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg]^{1 - p\lambda^2} \\  &\le  {\mathbb{E}\bigg[\exp{\bigg(\frac{1}{2}\int_0^t \mu^2(\omega, t) \,\mathrm{d}s                        \bigg)}\bigg]}^{(1 - p\lambda^2)/2} \\  &\le  K^{(1 - p\lambda^2)/2}  \end{aligned}

We can replace M_t by M_t {\mathcal{I}}_{t < \tau} for any stopping time \tau. Thus for a localizing sequence we have

\displaystyle   \begin{aligned}  \mathbb{E}((M_{t \land \tau_n}(\lambda\mu))^p)  &\le  K^{(1 - p\lambda^2)/2}  \end{aligned}

From which we can conclude

\displaystyle   \sup_n \|M_{T \land \tau_n}(\lambda\mu)\|_p < \infty

Now we can apply the second lemma to conclude that M_{T \land \tau_n}(\lambda\mu) is a strict martingale.

Final Step

We have already calculated that

\displaystyle   \begin{aligned}  M_t(\lambda\mu) &=  \exp{\bigg(\int^t_0 \lambda\mu(\omega, s)\,\mathrm{d}W_s -  \frac{1}{2}\int^t_0 \lambda^2\mu^2(\omega, s)\,\mathrm{d}s\bigg)} \\  &= {(M_t(\mu))}^{\lambda^2}\exp{\bigg((\lambda - \lambda^2)\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}  \end{aligned}

Now apply Hölder’s inequality with conjugates p = \lambda^{-2} and q = (1 - \lambda^2)^{-1}.

\displaystyle   1 = \mathbb{E}(M_t(\lambda\mu) \le  \mathbb{E}(M_t(\mu))^{\lambda^2}\mathbb{E}{\bigg(}\exp{\bigg(\frac{\lambda}{1 + \lambda}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg)^{1 - \lambda^2}

And then we can apply Jensen’s inequality to the last term on the right hand side with the convex function x^{(1 + \lambda)/2\lambda}.

\displaystyle   1 \le  \mathbb{E}(M_t(\mu))^{\lambda^2}  \mathbb{E}{\bigg(}\exp{\bigg(\frac{1}{2}\int^t_0 \mu(\omega, s)\,\mathrm{d}W_s\bigg)}\bigg)^{2\lambda(1- \lambda)}

Using the inequality we established in Step 2 and the Novikov condition then gives

\displaystyle   1 \le  \mathbb{E}(M_t(\mu))^{\lambda^2}  K^{\lambda(1 - \lambda)}

If we now let \lambda \nearrow 1 we see that we must have 1 \le \mathbb{E}(M_t(\mu)). We already now that 1 \ge \mathbb{E}(M_t(\mu)) by the first lemma and so we have finally proved that M_t(\mu) is a martingale.


  1. We have already used importance sampling and also touched on changes of measure.

  2. Chebyshev’s inequality is usually stated for the second moment but the proof is easily adapted:

\displaystyle   \mathbb P( |X| > u ) = \int 1_{|X| > u} ~d\mathbb P = \frac 1 {u^p} \int u^p 1_{|X| > u} ~d\mathbb P < \frac 1 {u^p} \int |X|^p 1_{|X| > u} ~ d\mathbb P \le \frac 1 {u^p} \mathbb E|X|^p.

  1. We follow Handel (2007); a similar approach is given in Steele (2001).


Handel, Ramon von. 2007. “Stochastic Calculus, Filtering, and Stochastic Control (Lecture Notes).”

Steele, J.M. 2001. Stochastic Calculus and Financial Applications. Applications of Mathematics. Springer New York.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s