Introduction
Suppose you want to sample from the truncated normal distribution. One way to do this is to use rejection sampling. But if you do this naïvely then you will run into performance problems. The excellent Devroye (1986) who references Marsaglia (1964) gives an efficient rejection sampling scheme using the Rayleigh distribution. The random-fu package uses the Exponential distribution.
Performance
> {-# OPTIONS_GHC -Wall #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans #-}
> {-# LANGUAGE FlexibleContexts #-}
> import Control.Monad
> import Data.Random
> import qualified Data.Random.Distribution.Normal as N
> import Data.Random.Source.PureMT
> import Control.Monad.State
Here’s the naïve implementation.
> naiveReject :: Double -> RVar Double
> naiveReject x = doit
> where
> doit = do
> y <- N.stdNormal
> if y < x
> then doit
> else return y
And here’s an implementation using random-fu.
> expReject :: Double -> RVar Double
> expReject x = N.normalTail x
Let’s try running both of them
> n :: Int
> n = 10000000
> lower :: Double
> lower = 2.0
> testExp :: [Double]
> testExp = evalState (replicateM n $ sample (expReject lower)) (pureMT 3)
> testNaive :: [Double]
> testNaive = evalState (replicateM n $ sample (naiveReject lower)) (pureMT 3)
> main :: IO ()
> main = do
> print $ sum testExp
> print $ sum testNaive
Let’s try building and running both the naïve and better tuned versions.
ghc -O2 CompareRejects.hs
As we can see below we get 59.98s and 4.28s, a compelling reason to use the tuned version. And the difference in performance will get worse the less of the tail we wish to sample from.
Tuned
2.3731610476911187e7
11,934,195,432 bytes allocated in the heap
5,257,328 bytes copied during GC
44,312 bytes maximum residency (2 sample(s))
21,224 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 23145 colls, 0 par 0.09s 0.11s 0.0000s 0.0001s
Gen 1 2 colls, 0 par 0.00s 0.00s 0.0001s 0.0002s
INIT time 0.00s ( 0.00s elapsed)
MUT time 4.19s ( 4.26s elapsed)
GC time 0.09s ( 0.11s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 4.28s ( 4.37s elapsed)
%GC time 2.2% (2.6% elapsed)
Alloc rate 2,851,397,967 bytes per MUT second
Productivity 97.8% of total user, 95.7% of total elapsed
Naïve
2.3732073159369867e7
260,450,762,656 bytes allocated in the heap
111,891,960 bytes copied during GC
85,536 bytes maximum residency (2 sample(s))
76,112 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 512768 colls, 0 par 1.86s 2.24s 0.0000s 0.0008s
Gen 1 2 colls, 0 par 0.00s 0.00s 0.0001s 0.0002s
INIT time 0.00s ( 0.00s elapsed)
MUT time 58.12s ( 58.99s elapsed)
GC time 1.86s ( 2.24s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 59.98s ( 61.23s elapsed)
%GC time 3.1% (3.7% elapsed)
Alloc rate 4,481,408,869 bytes per MUT second
Productivity 96.9% of total user, 94.9% of total elapsed
Bibliography
Devroye, L. 1986. Non-Uniform Random Variate Generation. Springer-Verlag. http://books.google.co.uk/books?id=mEw\_AQAAIAAJ.
Marsaglia, G. 1964. “Generating a Variable from the Tail of the Normal Distribution.” J-Technometrics 6 (1): 101–2.