Rejection Sampling

Introduction

Suppose you want to sample from the truncated normal distribution. One way to do this is to use rejection sampling. But if you do this naïvely then you will run into performance problems. The excellent Devroye (1986) who references Marsaglia (1964) gives an efficient rejection sampling scheme using the Rayleigh distribution. The random-fu package uses the Exponential distribution.

Performance

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}
> {-# LANGUAGE FlexibleContexts             #-}
> import Control.Monad
> import Data.Random
> import qualified Data.Random.Distribution.Normal as N
> import Data.Random.Source.PureMT
> import Control.Monad.State

Here’s the naïve implementation.

> naiveReject :: Double -> RVar Double
> naiveReject x = doit
>   where
>     doit = do
>       y <- N.stdNormal
>       if y < x
>         then doit
>         else return y

And here’s an implementation using random-fu.

> expReject :: Double -> RVar Double
> expReject x = N.normalTail x

Let’s try running both of them

> n :: Int
> n = 10000000
> lower :: Double
> lower = 2.0
> testExp :: [Double]
> testExp = evalState (replicateM n $ sample (expReject lower)) (pureMT 3)
> testNaive :: [Double]
> testNaive = evalState (replicateM n $ sample (naiveReject lower)) (pureMT 3)
> main :: IO ()
> main = do
>   print $ sum testExp
>   print $ sum testNaive

Let’s try building and running both the naïve and better tuned versions.

ghc -O2 CompareRejects.hs

As we can see below we get 59.98s and 4.28s, a compelling reason to use the tuned version. And the difference in performance will get worse the less of the tail we wish to sample from.

Tuned

2.3731610476911187e7
  11,934,195,432 bytes allocated in the heap
       5,257,328 bytes copied during GC
          44,312 bytes maximum residency (2 sample(s))
          21,224 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     23145 colls,     0 par    0.09s    0.11s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0002s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    4.19s  (  4.26s elapsed)
  GC      time    0.09s  (  0.11s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    4.28s  (  4.37s elapsed)

  %GC     time       2.2%  (2.6% elapsed)

  Alloc rate    2,851,397,967 bytes per MUT second

  Productivity  97.8% of total user, 95.7% of total elapsed

Naïve

2.3732073159369867e7
 260,450,762,656 bytes allocated in the heap
     111,891,960 bytes copied during GC
          85,536 bytes maximum residency (2 sample(s))
          76,112 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     512768 colls,     0 par    1.86s    2.24s     0.0000s    0.0008s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0002s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   58.12s  ( 58.99s elapsed)
  GC      time    1.86s  (  2.24s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    59.98s  ( 61.23s elapsed)

  %GC     time       3.1%  (3.7% elapsed)

  Alloc rate    4,481,408,869 bytes per MUT second

  Productivity  96.9% of total user, 94.9% of total elapsed

Bibliography

Devroye, L. 1986. Non-Uniform Random Variate Generation. Springer-Verlag. http://books.google.co.uk/books?id=mEw\_AQAAIAAJ.

Marsaglia, G. 1964. “Generating a Variable from the Tail of the Normal Distribution.” J-Technometrics 6 (1): 101–2.

Leave a comment