News aggregator

Zalora - Wed, 03/26/2014 - 12:10am
Categories: Offsite Blogs

Zalora - Wed, 03/26/2014 - 12:10am
Categories: Offsite Blogs

Ketil Malde: Parallel SNP identification

Planet Haskell - Wed, 03/26/2014 - 12:00am

Lately, I’ve been trying to do SNP identification. Basically, we take samples from different populations (geographically separated, or based on some phenotype or property), sequence them in pools, and then we want to identify variants that are more specific to certain populations. The basic method is to map sequences to a reference genome, and examine the sites where the mappings disagree - either with the reference, or with each other.

There are many different implementation of this, most require indiviual genotypes (that is, each position is identified for each individual as homo- or heterozygotic), and most are geared towards finding variant positions, rather than finding diagnostic variants. Many methods - at least those I’ve tested are also quite slow, and they give output that is harder to use (i.e. pairwise scores, rather than overall).

In my case, we have pooled sequences (i.e. not individual data), we want diagnostic variants (i.e. those that differ most between populations), and we only need candidates that will be tested on individuals later, so some errors are okay. So I implemented my own tool for this, and ..uh.. invented some new measures for estimating diversity. (Yes, I know, I know. But on the other hand, it does look pretty good, in my brief testing I get less than a 0.06% FP positive rate - and plenty of true positives. But if things work out, you’ll hear more about that later.)

Parallel implementation

Anyway - my program is reasonably fast, but ‘samtools mpileup’ is a bit faster. So I thought I’d parallelize the execution. After all, we live in the age of multicore and functional programming, meaning plenty of CPUs, and an abundance of parallel programming paradigms at our disposal. Since the program basically is a simple function applied line by line, it should be easy to parallelize with ‘parMap’ or something similarly straightforward. The probelm with ‘parMap’ is that - although it gets a speedup for short inputs - it builds a list with one MVar per input (or IVars if you use the [Par monad]) strictly. So for my input - three billion records or so - this will be quite impractical.

After toying a bit with different modeles, I ended up used a rather quick and dirty hack. Avert your eyes, or suffer the consequences:

parMap :: (b -> a) -> [b] -> IO [a] parMap f xs = do outs <- replicateM t newEmptyMVar ins <- replicateM t newEmptyMVar sequence_ [forkIO (worker i o) | (i,o) <- zip ins outs] forkIO $ sequence_ [putMVar i (f x) | (i,x) <- zip (cycle ins) xs] sequence' [takeMVar o | (o,_) <- zip (cycle outs) xs] worker :: MVar a -> MVar a -> IO b worker imv omv = forever $ do ex <- takeMVar imv ex `seq` putMVar omv ex

Basically, I use worker threads with two MVars for input and output, arranged in circular queues. The input MVars are populated by a separate thread, and then the outputs are collected in sequence. I don’t bother about terminating threads or cleaning up, so it’s likely this would leak space or have other problems. But it seems to work okay for my purposes. And sequence_ is the normal sequence, but lazy, i.e. it uses unsafeInterleaveIO to delay execution.


To test the parallel performance, I first ran samtools mpileup on six BAM files, and then extracted the first ten million records (lines) from the output file. I then processed the result, calculating confidence intervals, a chi-squared test, and Fst, with the results in the table below.

Table 1: Benchmark results for different numbers of threads. Threads wall time time (s) size (MB) cpu (s) sys (s) 1 8:00 480 32.0 472 7 2 4:40 280 39.7 525 19 4 2:33 153 40.1 550 32 8 1:43 103 40.7 689 57 12 1:50 110 36.9 978 106 16 1:33 93 40.3 1047 125 24 1:17 77 40.7 1231 152

Benchmark results for different numbers of threads.

I would guess that system time is Linux scheduling overhead.

I thought the time with 12 threads was a fluke, but then I reran it thrice: 1:52, 1:50, 1:52. Strange? The only explanation I can see, is that the machine has 12 real cores (hyperthreading giving the pretense of 24), and that running the same number of threads gives some undesirable effect. Running for t={10,11,13,14} gives

Table 2: A more detailed look at performance scaling. Threads time 10 1:45 11 1:44 12 1:51 13 1:45 14 1:45

So: apparently a fluke in the middle there, and 11 seems like a local minimum. So let’s try also with 23 threads:

% /usr/bin/time varan 10m.mpile.out --conf --chi --f-st > /dev/null +RTS -N23 1165.14user 137.80system 1:15.10elapsed 1734%CPU (0avgtext+0avgdata 41364maxresident)k 0inputs+0outputs (0major+10518minor)pagefaults 0swaps

Sure enough, slightly faster than 24, and 5% less work overall. Of course, the GHC scheduler is constantly being improved, so this will likely change. These numbers were with 7.6.3, I also have a 32-core machine with 7.0.4, but currently, that refuses to compile some of the dependencies.


Memory stays roughly constant (and quite low), scalability is okay, but starts to suffer with increasing parallelism, and gains are small beyond about eight threads.

Genomes vary enourmously in size, from a handful of megabases to multiple gigabases (each position corresponding to a record here), but scaling this up to human-sized genomes at about three gigabases indicates a processing time of roughly two days for a single-threaded run, reduced to about ten hours if you can throw four CPUs at it. This particular difference is important, as you can start the job as you leave work, and have the results ready the next morning (as opposed to waiting to over the weekend). Anyway, the bottleneck is now samtools. I can live with that.

Categories: Offsite Blogs

Why this Haskell program runs considerably slower than its JavaScript equivalent? (Frustrated)

Haskell on Reddit - Tue, 03/25/2014 - 8:08pm

OK, I've been considering moving to Haskell from JavaScript as my main language due to it being a much nicer overall. But I must admit I've been frustrated with its performance. I was not expecting Haste/Fay to be as fast as JavaScript, but to my surprise Haskell's binary itself was considerably slower than it!

Haskell program (meshbuilder.hs)

JavaScript program (javascript_.js)

This program generates a mesh from a list of pivots (rotations, lengths, vertex radiuses and colors). As you can see, the algorithms are exactly the same.

Commands used:

fay -O meshbuilder.hs -o fay.js hastec -O2 meshbuilder.hs; mv meshbuilder.hs haste_.js ghc -O2 meshbuilder.hs -o haskell closure-compiler --compilation_level=ADVANCED_OPTIMIZATIONS fay.js > fay_.js closure-compiler --compilation_level=ADVANCED_OPTIMIZATIONS haste.js > haste_.js


$time node fay_.js real 0m4.693s user 0m4.543s sys 0m0.174s $time node haste_.js real 0m6.310s user 0m6.010s sys 0m0.331s $time node javascript_.js real 0m0.053s user 0m0.042s sys 0m0.008s $time ./haskell real 0m0.370s user 0m0.328s sys 0m0.040s

(Uncut version.) Haste/Fay are ~120x slower than JS - fair. But how can Haskell's compiled binary be 7x slower than node.js, which still has to parse and JIT-compile the file!? Am I doing something really wrong here?

Edit: after using foldl' I got a 3x performance increase on Haskell, but it is still almost 3x slower than the Node.js version and Haste/Fay remained unchanged. Trying to use Vector instead of lists resulted in a performance loss. I'm still highly disappointed, I was really expecting Haskell to be much faster than node.

Edit2: OK some big performance gains using sequences for mapAccumL. That is neat and a little bit faster than Node.js. You guys rock. I'm wondering if it would be possible to go further and make much faster than Node.js? For example, using mutable arrays - or perhaps even invoking more cores, or the GPU?

submitted by SrPeixinho
[link] [115 comments]
Categories: Incoming News

AI4FM 2014: Call for Participation

General haskell list - Tue, 03/25/2014 - 5:17pm
------------------------------------------------- AI4FM 2014 - the 5th International Workshop on the use of AI in Formal Methods Singapore, 13th May, 2014 In association with FM 2014 ------------------------------------------------- --- Call For Participation --- Workshop information --------------- Workshop: May 13th, 2014 Registration: Confirmed Speakers --------------- Gerwin Klein, NICTA Rustan Leino, Microsoft Research Chin Wei Ngan, National University of Singapore Dominique Méry, LORIA and Université de Lorraine Andrius Velykis, Newcastle University Ligia Nistor and Jonathan Aldrich, CMU Cliff Jones, Newcastle University Gudmund Grov, Heriot-Watt University About the workshop --------------- This workshop will bring together researchers from formal methods, automated reasoning and AI; it will address the issue of how AI can be used to support the formal software development process, inc
Categories: Incoming News

Index of /~tkobus/laboratoria - Tue, 03/25/2014 - 4:59pm
Categories: Offsite Blogs

[ANN] haskintex - Haskell code inside a LaTeX document, now with IO!

haskell-cafe - Tue, 03/25/2014 - 2:46pm
Hello haskellers. Good news for the users of haskintex. This release opens new possibilities and solves some frustrations we had in the past. For those who don't know what haskintex is, here is an overview. Haskintex is a program that reads a LaTeX file and evaluates Haskell expressions contained in some specific commands and environments. For example, if you save the file "foo.htex" with:
Categories: Offsite Discussion

Functional Programming Group - Tue, 03/25/2014 - 2:40pm
Categories: Offsite Blogs

Haskell at IMVU

Haskell on Reddit - Tue, 03/25/2014 - 1:22pm
Categories: Incoming News