News aggregator

2048 in haskell

Haskell on Reddit - Wed, 04/02/2014 - 9:43am
Categories: Incoming News

Benchmarking with Criterion, 'scanl1 (+)' appears 7 times faster than 'sum' ?

Haskell on Reddit - Wed, 04/02/2014 - 6:40am


I am trying out Criterion to make some quick measurements, here I compare scanl1 (+) rand with sum rand for a random list, and the former takes much less time than the latter, even though it should be more work.

If I try last . scanl1 (+) to obtain a value equal to sum rand, I get a similar time.

One point which may be unclear to me is the description of nf in Criterion.Main:

Apply an argument to a function, and evaluate the result to head normal form (NF).

I'm not sure what "head normal form" means in the context of Haskell, particularly when there are constructors, but the comments at the top of the page seem to indicate that nf would fully evaluate the list.

What am I missing?

submitted by Syrak
[link] [7 comments]
Categories: Incoming News

Libc considered harmful

Haskell on Reddit - Wed, 04/02/2014 - 6:24am
Categories: Incoming News

Dominic Steinitz: Student’s T and Space Leaks

Planet Haskell - Wed, 04/02/2014 - 4:17am

The other speaker at the Machine Learning Meetup at which I gave my talk on automatic differentiation gave a very interesting talk on A/B testing. Apparently this is big business these days as attested by the fact I got 3 ads above the wikipedia entry when I googled for it.

It seems that people tend to test with small sample sizes and to do so very often, resulting in spurious results. Of course readers of XKCD will be well aware of some of the pitfalls.

I thought a Bayesian approach might circumvent some of the problems and set out to write a blog article only to discover that there was no Haskell library for sampling from Student’s t. Actually there was one but is currently an unreleased part of random-fu. So I set about fixing this shortfall.

I thought I had better run a few tests so I calculated the sampled mean, variance, skewness and kurtosis.

I wasn’t really giving this my full attention and as a result ran into a few problems with space. I thought these were worth sharing and that is what this blog post is about. Hopefully, I will have time soon to actually blog about the Bayesian equivalent of A/B testing.

Preamble > {-# OPTIONS_GHC -Wall #-} > {-# OPTIONS_GHC -fno-warn-name-shadowing #-} > {-# OPTIONS_GHC -fno-warn-type-defaults #-} > {-# OPTIONS_GHC -fno-warn-unused-do-bind #-} > {-# OPTIONS_GHC -fno-warn-missing-methods #-} > {-# OPTIONS_GHC -fno-warn-orphans #-} > > {-# LANGUAGE NoMonomorphismRestriction #-} > > module StudentTest ( > main > ) where > > import qualified Data.Vector.Unboxed as V > import Data.Random.Source.PureMT > import Data.Random > import Data.Random.Distribution.T > import Control.Monad.State > import Data.Histogram.Fill > import Data.Histogram.Generic ( Histogram ) > import Data.List Space Analysis

Let’s create a reasonable number of samples as the higher moments converge quite slowly.

> nSamples :: Int > nSamples = 1000000

An arbitrary seed for creating the samples.

> arbSeed :: Int > arbSeed = 8

Student’s t only has one parameter, the number of degrees of freedom.

> nu :: Integer > nu = 6

Now we can do our tests by calculating the sampled values.

> ts :: [Double] > ts = > evalState (replicateM nSamples (sample (T nu))) > (pureMT $ fromIntegral arbSeed) > mean, variance, skewness, kurtosis :: Double > mean = (sum ts) / fromIntegral nSamples > variance = (sum (map (**2) ts)) / fromIntegral nSamples > skewness = (sum (map (**3) ts) / fromIntegral nSamples) / variance**1.5 > kurtosis = (sum (map (**4) ts) / fromIntegral nSamples) / variance**2

This works fine for small sample sizes but not for the number we have chosen.

./StudentTest +RTS -hc Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it.

It seems a shame that the function in the Prelude has this behaviour but never mind let us ensure that we consume values strictly (they are being produced lazily).

> mean' = (foldl' (+) 0 ts) / fromIntegral nSamples > variance' = (foldl' (+) 0 (map (**2) ts)) / fromIntegral nSamples > skewness' = (foldl' (+) 0 (map (**3) ts) / fromIntegral nSamples) / variance'**1.5 > kurtosis' = (foldl' (+) 0 (map (**4) ts) / fromIntegral nSamples) / variance'**2

We now have a space leak on the heap as using the ghc profiler below shows. What went wrong?

If we only calculate the mean using foldl then all is well. Instead of 35M we only use 45K.

Well that gives us a clue. The garbage collector cannot reclaim the samples as they are needed for other calculations. What we need to do is calculate the moments strictly altogether.

Let’s create a strict record to do this.

> data Moments = Moments { m1 :: !Double > , m2 :: !Double > , m3 :: !Double > , m4 :: !Double > } > deriving Show

And calculate the results strictly.

> > m = foldl' (\m x -> Moments { m1 = m1 m + x > , m2 = m2 m + x**2 > , m3 = m3 m + x**3 > , m4 = m4 m + x**4 > }) (Moments 0.0 0.0 0.0 0.0) ts > > mean'' = m1 m / fromIntegral nSamples > variance'' = m2 m / fromIntegral nSamples > skewness'' = (m3 m / fromIntegral nSamples) / variance''**1.5 > kurtosis'' = (m4 m / fromIntegral nSamples) / variance''**2

Now we have what we want; the program runs in small constant space.

> main :: IO () > main = do > putStrLn $ show mean'' > putStrLn $ show variance'' > putStrLn $ show skewness'' > putStrLn $ show kurtosis''

Oh and the moments give the expected answers.

ghci> mean'' 3.9298418844289093e-4 ghci> variance'' 1.4962681916693004 ghci> skewness'' 1.0113188204317015e-2 ghci> kurtosis'' 5.661776268997382 Running the Code

To run this you will need my version of random-fu. The code for this article is here. You will need to compile everything with profiling, something like

ghc -O2 -main-is StudentTest StudentTest.lhs -prof -package-db=.cabal-sandbox/x86_64-osx-ghc-7.6.2-packages.conf.d

Since you need all the packages to be built with profiling, you will probably want to build using a sandbox as above. The only slightly tricky aspect is building random-fu so it is in your sandbox.

runghc Setup.lhs configure --enable-library-profiling --package-db=/HasBayes/.cabal-sandbox/x86_64-osx-ghc-7.6.2-packages.conf.d --libdir=/HasBayes/.cabal-sandbox/lib
Categories: Offsite Blogs

Adventures in Hoogling - Wed, 04/02/2014 - 12:47am
Categories: Offsite Blogs

Adventures in Hoogling - Wed, 04/02/2014 - 12:47am
Categories: Offsite Blogs

valderman/haste-compiler - Tue, 04/01/2014 - 4:41pm
Categories: Offsite Blogs

valderman/haste-compiler - Tue, 04/01/2014 - 4:41pm
Categories: Offsite Blogs

New gtk2hs 0.12.4 release

gtk2hs - Wed, 11/21/2012 - 12:56pm

Thanks to John Lato and Duncan Coutts for the latest bugfix release! The latest packages should be buildable on GHC 7.6, and the cairo package should behave a bit nicer in ghci on Windows. Thanks to all!


Categories: Incoming News