# Introduction

Suppose we have a vector of weights which sum to 1.0 and we wish to sample n samples randomly according to these weights. There is a well known trick in Matlab / Octave using sampling from a uniform distribution.

```
num_particles = 2*10^7
likelihood = zeros(num_particles,1);
likelihood(:,1) = 1/num_particles;
[_,index] = histc(rand(num_particles,1),[0;cumsum(likelihood/sum(likelihood))]);
s = sum(index);
```

Using *tic* and *toc* this produces an answer with

`Elapsed time is 10.7763 seconds.`

# Haskell

I could find no equivalent function in Haskell nor could I easily find a binary search function.

```
> {-# OPTIONS_GHC -Wall #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans #-}
```

```
> {-# LANGUAGE BangPatterns #-}
```

```
> import System.Random.MWC
> import qualified Data.Vector.Unboxed as V
> import Control.Monad.ST
```

```
> import qualified Data.Vector.Algorithms.Search as S
```

```
> import Data.Bits
```

```
> n :: Int
> n = 2*10^7
```

Let’s create some random data. For a change let’s use mwc-random rather than random-fu.

```
> vs :: V.Vector Double
> vs = runST (create >>= (asGenST $ \gen -> uniformVector gen n))
```

Again, I could find no equivalent of *cumsum* but we can write our own.

```
> weightsV, cumSumWeightsV :: V.Vector Double
> weightsV = V.replicate n (recip $ fromIntegral n)
> cumSumWeightsV = V.scanl (+) 0 weightsV
```

Binary search on a sorted vector is straightforward and a cumulative sum ensures that the vector is sorted.

```
> binarySearch :: (V.Unbox a, Ord a) =>
> V.Vector a -> a -> Int
> binarySearch vec x = loop 0 (V.length vec - 1)
> where
> loop !l !u
> | u <= l = l
> | otherwise = let e = vec V.! k in if x <= e then loop l k else loop (k+1) u
> where k = l + (u - l) `shiftR` 1
```

```
> indices :: V.Vector Double -> V.Vector Double -> V.Vector Int
> indices bs xs = V.map (binarySearch bs) xs
```

To see how well this performs, let’s sum the indices (of course, we wouldn’t do this in practice) as we did for the Matlab implementation.

```
> js :: V.Vector Int
> js = indices (V.tail cumSumWeightsV) vs
```

```
> main :: IO ()
> main = do
> print $ V.foldl' (+) 0 js
```

Using *+RTS -s* we get

`Total time 10.80s ( 11.06s elapsed)`

which is almost the same as the Matlab version.

I did eventually find a binary search function in vector-algorithms and since one should not re-invent the wheel, let us try using it.

```
> indices' :: (V.Unbox a, Ord a) => V.Vector a -> V.Vector a -> V.Vector Int
> indices' sv x = runST $ do
> st <- V.unsafeThaw (V.tail sv)
> V.mapM (S.binarySearch st) x
```

```
> main' :: IO ()
> main' = do
> print $ V.foldl' (+) 0 $ indices' cumSumWeightsV vs
```

Again using *+RTS -s* we get

`Total time 11.34s ( 11.73s elapsed)`

So the library version seems very slightly slower.