If you look at the wikipedia article on Hidden Markov Models (HMMs) then you might be forgiven for concluding that these deal only with discrete time and finite state spaces. In fact, HMMs are much more general. Furthermore, a better understanding of such models can be helped by putting them into context. Before actually specifying what an HMM is, let us review something of Markov processes. A subsequent blog post will cover HMMs themselves.
Markov Process and Chains
Recall that a transition kernel is a mapping where and are two measurable spaces such that is a probability measure on for all and such that is a measurable function on for all .
For example, we could have and and . Hopefully this should remind you of the transition matrix of a Markov chain.
Recall further that a family of such transitions where is some index set satisfying
gives rise to a Markov process (under some mild conditions — see Rogers and Williams (2000) and Kallenberg (2002) for much more detail), that is, a process in which what happens next only depends on where the process is now and not how it got there.
Let us carry on with our example and take . With a slight abuse of notation and since is finite we can re-write the integral as a sum
which we recognise as a restatement of how Markov transition matrices combine.
A Fully Deterministic System
A deterministic system can be formulated as a Markov process with a particularly simple transition kernel given by
where is the deterministic state update function (the flow) and is the Dirac delta function.
Let us suppose that the determinstic system is dependent on some time-varying values for which we we are unable or unwish to specify a deterministic model. For example, we may be considering predator-prey model where the parameters cannot explain every aspect. We could augment the deterministic kernel in the previous example with
where we use Greek letters for the parameters (and Roman letters for state) and we use e.g. to indicate probability densities. In other words that the parameters tend to wiggle around like Brown’s pollen particles rather than remaining absolutely fixed.
Of course Brownian motion or diffusion may not be a good model for our parameters; with Brownian motion, the parameters could drift off to . We might believe that our parameters tend to stay close to some given value (mean-reverting) and use the Ornstein-Uhlenbeck kernel.
where expresses how strongly we expect the parameter to respond to perturbations, is the mean to which the process wants to revert (aka the asymptotic mean) and expresses how noisy the process is.
It is sometimes easier to view these transition kernels in terms of stochastic differential equations. Brownian motion can be expressed as
and Ornstein-Uhlenbeck can be expressed as
where is the Wiener process.
Let us check that the latter stochastic differential equation gives the stated kernel. Re-writing it in integral form and without loss of generality taking
Since the integral is of a deterministic function, the distribution of is normal. Thus we need only calculate the mean and variance.
The mean is straightforward.
Without loss of generality assume and writing for covariance
And now we can use Ito and independence
Substituting in gives the desired result.
Kallenberg, O. 2002. Foundations of Modern Probability. Probability and Its Applications. Springer New York. http://books.google.co.uk/books?id=TBgFslMy8V4C.
Rogers, L. C. G., and David Williams. 2000. Diffusions, Markov Processes, and Martingales. Vol. 1. Cambridge Mathematical Library. Cambridge: Cambridge University Press.