News aggregator

Speeding up conduit

Haskell on Reddit - Thu, 08/21/2014 - 3:05am
Categories: Incoming News

Are there any libraries/examples of using reversible IO monads to implement an undo/redo system?

Haskell on Reddit - Thu, 08/21/2014 - 2:25am

Sorry in advance if I have the terms wrong. I'm want to use this idea for a 2d drawing program, where, in Haskell, for each drawing action (as a monad) is returned, the database can store information about all actions in the sequence (since each /api/request can return a single action or a sequence of them). Then, a user can execute an undo action provided by the database to reverse the entire previous sequence. Of course the previous data would have to be stored in the database as well.

Is there is an existing library aimed at doing this? Or possible a database that I can interface with Haskell to provide the version capabilities at least?

I saw this thread but I couldn't make sense of the responses.

submitted by snoozer_cruiser
[link] [3 comments]
Categories: Incoming News

Vector sort poor performance

haskell-cafe - Thu, 08/21/2014 - 1:24am
Hi! I've got dramatically low sort performamce: about 2 us of time and 5 kB of allocation for unpacked vector of 4 (four) Double. Code: import Data.List import Control.Monad import Control.Monad.ST import qualified Data.Vector.Unboxed as V import qualified Data.Vector.Unboxed.Mutable as MV import qualified Data.Vector.Algorithms.Intro as Intro import Data.IORef arr = V.fromList ([1,2] :: [Double]) foo x = V.head q where q = runST $ do res <- V.unsafeThaw $ V.concat [ V.map (\e -> e + x) arr , V.map (\e -> e - x) arr] Intro.sort res V.unsafeFreeze res main = do ref <- newIORef 0 forM_ [0..100000] $! \i -> do modifyIORef' ref (+(foo $ fromInteger i)) -- for foo not to optimize out readIORef ref >>= print ghc -O2 sort.hs && time ./sort [1 of 1] Compiling Main ( sort.hs, sort.o ) Linking sort ... -4.999949999e9 real 0m0.189s user 0m0.184s sys 0m0.000s Does anybody know what's going
Categories: Offsite Discussion

Lee Pike: SmartChecking Matt Might’s Red-Black Trees

Planet Haskell - Wed, 08/20/2014 - 10:53pm

Matt Might gave a nice intro to QuickCheck via testing red-black trees recently. Of course, QuickCheck has been around for over a decade now, but it’s still useful (if underused–why aren’t you QuickChecking your programs!?).

In a couple of weeks, I’m presenting a paper on an alternative to QuickCheck called SmartCheck at the Haskell Symposium.

SmartCheck focuses on efficiently shrinking and generalizing large counterexamples. I thought it’d be fun to try some of Matt’s examples with SmartCheck.

The kinds of properties Matt Checked really aren’t in the sweet spot of SmartCheck, since the counterexamples are so small (Matt didn’t even have to define instances for shrink!). SmartCheck focuses on shrinking and generalizing large counterexamples.

Still, let’s see what it looks like. (The code can be found here.)

SmartCheck is only interesting for failed properties, so let’s look at an early example in Matt’s blog post where something goes wrong. A lot of the blog post focuses on generating sufficiently constrained arbitrary red-black trees. In the section entitled, “A property for balanced black depth”, a property is given to check that the path from the root of a tree to every leaf passes through the same number of black nodes. An early generator for trees fails to satisfy the property.

To get the code to work with SmartCheck, we derive Typeable and Generic instances for the datatypes, and use GHC Generics to automatically derive instances for SmartCheck’s typeclass. The only other main issue is that SmartCheck doesn’t support a `forall` function like in QuickCheck. So instead of a call to QuickCheck such as

> quickCheck (forAll nrrTree prop_BlackBalanced)

We change the arbitrary instance to be the nrrTree generator.

Because it is so easy to find a small counterexample, SmartCheck’s reduction algorithm does a little bit of automatic shrinking, but not too much. For example, a typical minimal counterexample returned by SmartCheck looks like

T R E 2 (T B E 5 E)

which is about as small as possible. Now onto generalization!

There are three generalization phases in SmartCheck, but we’ll look at just one, in which a formula is returned that is universally quantified if every test case fails. For the test case above, SmartCheck returns the following formula:

forall values x0 x1:
T R E 2 (T B x1 5 x0)

Intuitively, this means that for any well-typed trees chosen that could replace the variables x0 and x1, the resulting formula is still a counterexample.

The benefit to developers is seeing instantly that those subterms in the counterexample probably don’t matter. The real issue is that E on the left is unbalanced with (T B E 5 E) on the right.

One of the early design decisions in SmartCheck was focus on structurally shrinking data types and essentially ignore “base types” like Int, char, etc. The motivation was to improve efficiency on shrinking large counterexamples.

But for a case like this, generalizing base types would be interesting. We’d hypothetically get something like

forall values (x0, x1 :: RBSet Int) (x2, x3 :: Int):
T R E x2 (T B x1 x3 x0)

further generalizing the counterexample. It may be worth adding this behavior to SmartCheck.

SmartCheck’s generalization begins to bridge the gap from specific counterexamples to formulas characterizing counterexamples. The idea is related to QuickSpec, another cool tool developed by Claessen and Hughes (and SmallBone). Moreover, it’s a bridge between testing and verification, or as Matt puts it, from the 80% to the 20%.


Categories: Offsite Blogs

FP Complete: IAP: Speeding up conduit

Planet Haskell - Wed, 08/20/2014 - 6:00pm

This post contains fragments of active Haskell code, best viewed and executed at https://www.fpcomplete.com/blog/2014/08/iap-speeding-up-conduit

As most of us know, performance isn't a one-dimensional spectrum. There are in fact multiple different ways to judge performance of a program. A commonly recognized tradeoff is that between CPU and memory usage. Often times, a program can be sped up by caching more data, for example.

conduit is a streaming data library. In that sense, it has two very specific performance criterion it aims for:

  • Constant memory usage.
  • Efficient usage of scarce resources, such as closing file descriptors as early as possible.

While CPU performance is always a nice goal, it has never been my top priority in the library's design, especially given that in the main use case for conduit (streaming data in an I/O context), the I/O cost almost always far outweighs any CPU overhead from conduit.

However, for our upcoming Integrated Analysis Platform (IAP) release, this is no longer the case. conduit will be used in tight loops, where we do need to optimize for the lowest CPU overhead possible.

This blog post covers the first set of optimizations I've applied to conduit. There is still more work to be done, and throughout this blogpost I'll be describing some of the upcoming changes I am attempting.

I'll give a brief summary up front:

  • Applying the codensity transform results in much better complexity of monadic bind.
  • We're also less reliant on rewrite rules firing, which has always been unreliable (and now I know why).
  • This change does represent a breaking API change. However, it only affects users of the Data.Conduit.Internal module. If you've just been using the public API, your code will be unaffected, besides getting an automatic speedup.
  • These changes will soon be released as conduit 1.2.0, after a period for community feedback.

Note that this blog post follows the actual steps I went through (more or less) in identifying the performance issues I wanted to solve. If you want to skip ahead to the solution itself, you may want to skip to the discussion on difference lists, or even straight to continuation passing style, church-encoding, codensity.

By the way, after I originally wrote this blog post, I continued working on the optimizations I describe as possible future enhancements. Those are actually working out far better than I expected, and it looks like conduit 1.2.0 will be able to ship with them. I'll be writing a separate blog post detailing those changes. A bit of a teaser is: for vector-equivalent code, conduit now generates identical core as vector itself.

The benchmarks

Before embarking on any kind of serious optimizations, it's important to have some benchmarks. I defined three benchmarks for the work I was going to be doing:

  • A simple sum: adding up the numbers from 1 to 10000. This is to get a baseline of the overhead coming from conduit.

  • A monte carlo analysis: This was based on a previous IAP blog post. I noticed when working on that benchmark that, while the conduit solution was highly memory efficient, there was still room to speed up the benchmark.

  • Sliding vectors: Naren Sundar recently sent a sliding windows pull requests, which allow us to get a view of a fixed size of a stream of values. This feature is very useful for a number of financial analyses, especially regarding time series.

    Naren's pull request was based on immutable data structures, and for those cases it is highly efficient. However, it's possible to be far more memory efficient by writing to a mutable vector instead, and then taking immutable slices of that vector. Mihaly Barasz sent a pull request for this feature, and much to our disappointment, for small window sizes, it performed worse than sliding windows. We want to understand why.

You can see the benchmark code, which stays mostly unchanged for the rest of this blog post (a few new cases are added to demonstrate extra points). The benchmarks always contain a low-level base case representing the optimal performance we can expect from hand-written Haskell (without resorting to any kind of FFI tricks or the like).

You can see the first run results which reflect conduit 1.1.7, plus inlining of a few functions. Some initial analysis:

  • Control.Monad.foldM is surpringly slow.
  • Data.Conduit.List.foldM has a rather steep performance hit versus Data.Conduit.List.fold.
  • There's a very high overhead in the monte carlo analysis.
  • For sliding vector, the conduit overhead is more pronounced at smaller window sizes.
  • But even with large window sizes, mutable vector conduits still have a large overhead. The sliding window/immutable approach, however, shows almost no overhead.

That hopefully sets the scene enough for us to begin to dive in.

Rewrite rules: lift

GHC offers a very powerful optimization technique: rewrite rules. This allows you to tell the compiler that a certain expression can be rewritten to a more efficient one. A common example of a rewrite rule would be to state that map f . map g is the same as map (f . g). This can be expressed as:

{-# RULES "map f . map g" forall f g. map f . map g = map (f . g) #-}

Note that GHC's list rewrite rules are actually more complicated than this, and revolve around a concept called build/foldr fusion.

Let's look at the implementation of the yield function in conduit (with some newtypes stripped away):

yield :: Monad m => o -> ConduitM i o m () yield o = HaveOutput (Done ()) (return ()) o {-# INLINE [1] yield #-} {-# RULES "yield o >> p" forall o (p :: ConduitM i o m r). yield o >> p = HaveOutput p (return ()) o #-}

The core datatype of conduit is recursive. The HaveOutput constructor contains a field for "what to do next." In the case of yield, there isn't anything to do next, so we fill that with Done (). However, creating that Done () value just to throw it away after a monadic bind is wasteful. So we have a rewrite rule to fuse those two steps together.

But no such rewrite rule exists for lift! My first step was to add such a rule, and check the results. Unfortunately, the rule didn't have any real impact, because it wasn't firing. Let's put that issue to the side; we'll come back to it later.

Cleanup, inlining

One of the nice features introduced in (I believe) GHC 7.8 is that the compiler will now warn you when a rewrite rule may not fire. When compiling conduit, I saw messages like:

Data/Conduit/List.hs:274:11: Warning: Rule "source/map fusion $=" may never fire because ‘$=’ might inline first Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$=’ Data/Conduit/List.hs:275:11: Warning: Rule "source/map fusion =$=" may never fire because ‘=$=’ might inline first Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘=$=’ Data/Conduit/List.hs:542:11: Warning: Rule "source/filter fusion $=" may never fire because ‘$=’ might inline first Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$=’ Data/Conduit/List.hs:543:11: Warning: Rule "source/filter fusion =$=" may never fire because ‘=$=’ might inline first Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘=$=’ Data/Conduit/List.hs:552:11: Warning: Rule "connect to sinkNull" may never fire because ‘$$’ might inline first Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$$’

This demonstrates an important interaction between inlining and rewrite rules. We need to make sure that expressions that need to be rewritten are not inlined first. If they are first inlined, then GHC won't be able to rewrite them to our more optimized version.

A common approach to this is to delay inlining of functions until a later simplification phase. The GHC simplification process runs in multiple steps, and we can state that rules and inlining should only happen before or after a certain phase. The phases count down from 2 to 0, so we commonly want to delay inlining of functions until phase 0, if they may be subject to rewriting.

Conversely, some functions need to be inlined before a rewrite rule can fire. In stream fusion, for example, the fusion framework depends on the following sequencing to get good performance:

map f . map g -- inline map unstream . mapS f . stream . unstream . mapS g . stream -- rewrite stream . unstream unstream . mapS f . mapS g . stream -- rewrite mapS . mapS unstream . mapS (f . g) . stream

In conduit, we need to make sure that all of this is happening in the correct order. There was one particular complexity that made it difficult to ensure this happened. conduit in fact has two core datatypes: Pipe and ConduitM, with the latter being a more friendly newtype wrapper around the first. Up until this point, the code for the two was jumbled into a single internal module, making it difficult to track which things were being written in which version of the API.

My next step was to split things into .Pipe and .Conduit internal modules, and then clean up GHC's warnings to get rules to fire more reliably. This gave a modest performance boost to the sliding vector benchmarks, but not much else. But it does pave the way for future improvements.

Getting serious about sum, by cheating

The results so far have been uninspiring. We've identified a core problem (too many of those Done data constructors being used), and noticed that the rewrite rules that should fix that don't seem to be doing their job. Now let's take our first stab at really improving performance: with aggressive rewrite rules.

Our sum benchmark is really simple: use enumFromTo to create a stream of values, and fold (or foldM) to consume that. The thing that slows us down is that, in between these two simple functions, we end up allocating a bunch of temporary data structures. Let's get rid of them with rewrite rules!

This certainly did the trick. The conduit implementation jumped from 185us to just 8.63us. For comparison, the low level approach (or vector's stream fusion) clocks in at 5.77us, whereas foldl' on a list is 80.6us. This is a huge win!

But it's also misleading. All we've done here is sneakily rewritten our conduit algorithm into a low-level format. This solves the specific problem on the table (connecting enumFromTo with fold), but won't fully generalize to other cases. A more representative demonstration of this improvement is the speedup for foldM, which went from 1180us to 81us. The reason this is more realistic is that the rewrite rule is not specialized to enumFromTo, but rather works on any Source.

I took a big detour at this point, and ended up writing an initial implementation of stream fusion in conduit. Unfortunately, I ran into a dead end on that branch, and had to put that work to the side temporarily. However, the improvements discussed in the rest of this blog post will hopefully reopen the door to stream fusion, which I hope to investigate next.

Monte carlo, and associativity

Now that I'd made the results of the sum benchmark thoroughly useless, I decided to focus on the results of monte carlo, where the low level implementation still won by a considerable margin (3.42ms vs 10.6ms). The question was: why was this happening? To understand, let's start by looking at the code:

analysis = do successes <- sourceRandomN count $$ CL.fold (\t (x, y) -> if (x*x + y*(y :: Double) < 1) then t + 1 else t) (0 :: Int) return $ fromIntegral successes / fromIntegral count * 4 sourceRandomN :: (MWC.Variate a, MonadIO m) => Int -> Source m a sourceRandomN cnt0 = do gen <- liftIO MWC.createSystemRandom let loop 0 = return () loop cnt = do liftIO (MWC.uniform gen) >>= yield >> loop (cnt - 1) loop cnt0

The analysis function is not very interesting: it simply connects sourceRandomN with a fold. Given that we now have a well behaved and consistently-firing rewrite rule for connecting to folds, it's safe to say that was not the source of our slowdown. So our slowdown must be coming from:

liftIO (MWC.uniform gen) >>= yield >> loop (cnt - 1)

This should in theory generate really efficient code. yield >> loop (cnt - 1) should be rewritten to \x -> HaveOutput (loop (cnt - 1)) (return ()) x), and then liftIO should get rewritten to generate:

PipeM $ do x <- MWC.uniform gen return $ HaveOutput (loop $ cnt - 1) (return ()) x

I added another commit to include a few more versions of the monte carlo benchmark (results here). The two most interesting are:

  • Explicit usage of the Pipe constructors:

    sourceRandomNConstr :: (MWC.Variate a, MonadIO m) => Int -> Source m a sourceRandomNConstr cnt0 = ConduitM $ PipeM $ do gen <- liftIO MWC.createSystemRandom let loop 0 = return $ Done () loop cnt = do x <- liftIO (MWC.uniform gen) return $ HaveOutput (PipeM $ loop (cnt - 1)) (return ()) x loop cnt0

    This version ran in 4.84ms, vs the original conduit version which ran in 15.8ms. So this is definitely the problem!

  • Explicitly force right-associated binding order:

    sourceRandomNBind :: (MWC.Variate a, MonadIO m) => Int -> Source m a sourceRandomNBind cnt0 = lift (liftIO MWC.createSystemRandom) >>= \gen -> let loop 0 = return () loop cnt = do lift (liftIO $ MWC.uniform gen) >>= (\o -> yield o >> loop (cnt - 1)) in loop cnt0

    Or to zoom in on the important bit:

    lift (liftIO $ MWC.uniform gen) >>= (\o -> yield o >> loop (cnt - 1))

    By the monad laws, this code is identical to the original. However, instead of standard left-associativity, we have right associativity or monadic bind. This code ran in 5.19ms, an approximate threefold speedup vs the left associative code!

This issue of associativity was something Roman Cheplyaka told me about back in April, so I wasn't surprised to see it here. Back then, I'd looked into using Codensity together with ConduitM, but didn't get immediate results, and therefore postponed further research until I had more time.

OK, so why exactly does left-associativity hurt us so much? There are two reasons actually:

  • Generally speaking, many monads perform better when they are right associated. This is especially true for free monads, of which conduit is just a special case. Janis Voigtl ̈ander's paper Asymptotic Improvement of Computations over Free Monads and Edward Kmett's blog post series free monads for less do a far better job of explaining the issue than I could.
  • In the case of conduit, left associativity prevented the lift and yield rewrite rules from firing, which introduced extra, unnecessary monadic bind operations. Forcing right associativity allows these rules to fire, avoiding a lot of unnecessary data constructor allocation and analysis.

At this point, it became obvious at this point that the main slowdown I was seeing was driven by this problem. The question is: how should we solve it?

Difference lists

To pave the way for the next step, I want to take a quick detour and talk about something simpler: difference lists. Consider the following code:

(((w ++ x) ++ y) ++ z)

Most experienced Haskellers will cringe upon reading that. The append operation for a list needs to traverse every cons cell in its left value. When we left-associate append operations like this, we will need to traverse every cell in w, then every cell in w ++ x, then every cell in w ++ x ++ y. This is highly inefficient, and would clearly be better done in a right-associated style (sound familiar?).

But forcing programmers to ensure that their code is always right-associated isn't always practical. So instead, we have two common alternatives. The first is: use a better datastructure. In particular, Data.Sequence has far cheaper append operations than lists.

The other approach is to use difference lists. Difference lists are functions instead of actual list values. They are instructions for adding values to the beginning of the list. In order to append, you use normal function composition. And to convert them to a list, you apply the resulting function to an empty list. As an example:

type DList a = [a] -> [a] dlist1 :: DList Int dlist1 rest = 1 : 2 : rest dlist2 :: DList Int dlist2 rest = 3 : 4 : rest final :: [Int] final = dlist1 . dlist2 $ [] main :: IO () main = print final

Both difference lists and sequences have advantages. Probably the simplest summary is:

  • Difference lists have smaller constant factors for appending.
  • Sequences allow you to analyze them directly, without having to convert them to a different data type first.

That second point is important. If you need to regularly analyze your list and then continue to append, the performance of a difference list will be abysmal. You will constantly be swapping representations, and converting from a list to a difference list is an O(n) operation. But if you will simply be constructing a list once without any analysis, odds are difference lists will be faster.

This situation is almost identical to our problems with conduit. Our monadic composition operator- like list's append operator- needs to traverse the entire left hand side. This connection is more clearly spelled out in Reflection without Remorse by Atze van der Ploeg and Oleg Kiselyov (and for me, care of Roman).

Alright, with that out of the way, let's finally fix conduit!

Continuation passing style, church-encoding, codensity

There are essentially two things we need to do with conduits:

  • Monadically compose them to sequence two streams into a larger stream.
  • Categorically compose them to connect one stream to the next in a pipeline.

The latter requires that we be able to case analyze our datatypes, while theoretically the former does not: something like difference lists for simple appending would be ideal. In the past, I've tried out a number of different alternative implementations of conduit, none of which worked well enough. The problem I always ran into was that either monadic bind became too expensive, or categorical composition became too expensive.

Roman, Mihaly, Edward and I discussed these issues a bit on Github, and based on Roman's advice, I went ahead with writing a benchmark of different conduit implementations. I currently have four implementations in this benchmark (and hope to add more):

  • Standard, which looks very much like conduit 1.1, just a bit simplified (no rewrite rules, no finalizers, no leftovers).
  • Free, which is conduit rewritten to explicitly use the free monad transformer.
  • Church, which modifies Free to instead use the Church-encoded free monad transformer.
  • Codensity, which is a Codensity-transform-inspired version of conduit.

You can see the benchmark results, which clearly show the codensity version to be the winner. Though it would be interesting, I think I'll avoid going into depth on the other three implementations for now (this blog post is long enough already).

What is Codensity?

Implementing Codensity in conduit just means changing the ConduitM newtype wrapper to look like this:

newtype ConduitM i o m r = ConduitM { unConduitM :: forall b. (r -> Pipe i i o () m b) -> Pipe i i o () m b }

What this says is "I'm going to provide an r value. If you give me a function that needs an r value, I'll give it that r value and then continue with the resulting Pipe." Notice how similar this looks to the type signature of monadic bind itself:

(>>=) :: Pipe i i o () m r -> (r -> Pipe i i o () m b) -> Pipe i i o () m b

This isn't by chance, it's by construction. More information is available in the Haddocks of kan-extension, or in the above-linked paper and blog posts by Janis and Edward. To see why this change is important, let's look at the new implementations of some of the core conduit functions and type classes:

yield o = ConduitM $ \rest -> HaveOutput (rest ()) (return ()) o await = ConduitM $ \f -> NeedInput (f . Just) (const $ f Nothing) instance Monad (ConduitM i o m) where return x = ConduitM ($ x) ConduitM f >>= g = ConduitM $ \h -> f $ \a -> unConduitM (g a) h instance MonadTrans (ConduitM i o) where lift mr = ConduitM $ \rest -> PipeM (liftM rest mr)

Instead of having explicit Done constructors in yield, await, and lift, we use the continuation rest. This is the exact same transformation we were previously relying on rewrite rules to provide. However, our rewrite rules couldn't fire properly in a left-associated monadic binding. Now we've avoided the whole problem!

Our Monad instance also became much smaller. Notice that in order to monadically compose, there is no longer any need to case-analyze the left hand side, which avoids the high penalty of left association.

Another interesting quirk is that our Monad instance on ConduitM no longer requires that the base m type constructor itself be a Monad. This is nice feature of Codensity.

So that's half the story. What about categorical composition? That certainly does require analyzing both the left and right hand structures. So don't we lose all of our speed gains of Codensity with this? Actually, I think not. Let's look at the code for categorical composition:

ConduitM left0 =$= ConduitM right0 = ConduitM $ \rest -> let goRight final left right = case right of HaveOutput p c o -> HaveOutput (recurse p) (c >> final) o NeedInput rp rc -> goLeft rp rc final left Done r2 -> PipeM (final >> return (rest r2)) PipeM mp -> PipeM (liftM recurse mp) Leftover right' i -> goRight final (HaveOutput left final i) right' where recurse = goRight final left goLeft rp rc final left = case left of HaveOutput left' final' o -> goRight final' left' (rp o) NeedInput left' lc -> NeedInput (recurse . left') (recurse . lc) Done r1 -> goRight (return ()) (Done r1) (rc r1) PipeM mp -> PipeM (liftM recurse mp) Leftover left' i -> Leftover (recurse left') i where recurse = goLeft rp rc final in goRight (return ()) (left0 Done) (right0 Done)

In the last line, we apply left0 and right0 to Done, which is how we convert our Codensity version into something we can actually analyze. (This is equivalent to applying a difference list to an empty list.) We then traverse these values in the same way that we did in conduit 1.1 and earlier.

The important difference is how we ultimately finish. The code in question is the Done clause of the goRight's case analysis, namely:

Done r2 -> PipeM (final >> return (rest r2))

Notice the usage of rest, instead of what we would have previously done: used the Done constructor. By doing this, we're immediately recreating a Codensity version of our resulting Pipe, which allows us to only traverse our incoming Pipe values once each, and not need to retraverse the outgoing Pipe for future monadic binding.

This trick doesn't just work for composition. There are a large number of functions in conduit that need to analyze a Pipe, such as addCleanup and catchC. All of them are now implemented in this same style.

After implementing this change, the resulting benchmarks look much better. The naive implementation of monte carlo is now quite close to the low-level version (5.28ms vs 3.44ms, as opposed to the original 15ms). Sliding vector is also much better: the unboxed, 1000-size window benchmark went from 7.96ms to 4.05ms, vs a low-level implementation at 1.87ms.

Type-indexed sequences

One approach that I haven't tried yet is the type-indexed sequence approach from Reflection without Remorse. I still intend to add it to my conduit benchmark, but I'm not optimistic about it beating out Codensity. My guess is that a sequence data type will have a higher constant factor overhead, and based on the way composition is implemented in conduit, we won't get any benefit from avoiding the need to transition between two representations.

Edward said he's hoping to get an implementation of such a data structure into the free package, at which point I'll update my benchmark to see how it performs.

To pursue next: streamProducer, streamConsumer, and more

While this round of benchmarking produced some very nice results, we're clearly not yet at the same level as low-level code. My goal is to focus on that next. I have some experiments going already relating to getting conduit to expose stream fusion rules. In simple cases, I've generated a conduit-compatible API with the same performance as vector.

The sticking point is getting something which is efficient not just for functions explicitly written in stream style, but also provides decent performance when composed with the await/yield approach. While the latter approach will almost certainly be slower than stream fusion, I'm hoping we can get it to degrade to current-conduit performance levels, and allow stream fusion to provide a significant speedup when categorically composing two Conduits written in that style.

The code discussed in this post is now available on the next-cps branch of conduit. conduit-extra, conduit-combinators, and a number of other packages either compile out-of-the-box with these changes, or require minor tweaks (already implemented), so I'm hoping that this API change does not affect too many people.

As I mentioned initially, I'd like to have some time for community discussion on this before I make this next release.

Categories: Offsite Blogs

Hackage Ship

Haskell on Reddit - Wed, 08/20/2014 - 3:55pm
Categories: Incoming News

Which docker image for haskell development

Haskell on Reddit - Wed, 08/20/2014 - 12:06pm

Hi, I would like to get started with haskell development and would like to know what kind of docker image would be suitable to have a well-working environment. What kind of images are you using?

submitted by wirrbel
[link] [16 comments]
Categories: Incoming News

Neil Mitchell: Continuations and Exceptions

Planet Haskell - Wed, 08/20/2014 - 11:39am

Summary: In moving Shake to continuations, exceptions were the biggest headache. I figured out how to somewhat integrate continuations and exception handling.

The git repo for Shake now suspends inactive computations by capturing their continuation instead of blocking their thread, based on the continuations I described in a previous blog post. The most difficult part was managing exceptions. I needed to define a monad where I could capture continuations and work with exceptions, requiring the definitions:

data M a = ... deriving (Functor, Applicative, Monad, MonadIO)
throwM :: SomeException -> M a
catchM :: M a -> (SomeException -> M a) -> M a
captureM :: ((a -> IO ()) -> IO ()) -> M a

I'm using M as the name of the monad. I want equivalents of throwIO and catch for M, along with a function to capture continuations.

The first observation is that since catchM must catch any exceptions, including those thrown by users calling error, then throwM can be defined as:

throwM = liftIO . throwIO

Using throwIO gives better guarantees about when the exception is raised, compared to just throw.

The second observation is that sometimes I want to raise an exception on the continuation, rather than passing back a value. I can build that on top of captureM with:

captureM' :: ((Either SomeException a -> IO ()) -> IO ()) -> M a
captureM' k = either throwM return =<< captureM k

The third observation (which I observed after a few weeks trying not to follow it) is that the continuation may never be called, and that means you cannot implement a robust finallyM function. In particular, if the person who was intending to run the continuation themselves raises an exception, the continuation is likely to be lost. I originally tried to come up with schemes for defining the function passed the continuation to guarantee the continuation was called, but it became messy very quickly.

The properties we expect of the implementation, to a rough approximation, include:

  • catchM (x >> throwM e) (\_ -> y) >> z === x >> y >> z -- if you throw an exception inside a catchM, you must run the handler.
  • captureM (\k -> x) >>= y === x -- if you execute something not using the continuation inside captureM it must behave like it does outside captureM. In particular, if the captureM is inside a catchM, that catchM must not catch the exception.
  • captureM (\k -> k x) >>= y === x >>= y -- if you capture the continuation then continue that must be equivalent to not capturing the continuation.
  • captureM (\k -> k x >> k x) >>= y === (x >>= y) >> (x >>= y) -- if you run the continuation twice it must do the same IO actions each time. In particular, if the first gets its exceptions caught, the second must do also.

These properties are incomplete (there are other things you expect), and fuzzy (for example, the second property isn't type correct) - but hopefully they give an intuition.

The implementation was non-trivial and (sadly) non-elegant. I suspect a better implementation is known in the literature, and I'd welcome a pointer. My implementation defines M as:

type M a = ContT () (ReaderT (IORef (SomeException -> IO ())) IO) a

Here we have a continuation monad wrapping a reader monad. The reader contains an IORef which stores the exception handler. The basic idea is that whenever we start running anything in M we call the Haskell catch function, and the exception handler forwards to the IORef. We can define catchM as:

catchM :: M a -> (SomeException -> M a) -> M a
catchM m hdl = ContT $ \k -> ReaderT $ \s -> do
old <- liftIO $ readIORef s
writeIORef s $ \e -> do
s <- newIORef old
hdl e `runContT` k `runReaderT` s `catch`
\e -> ($ e) =<< readIORef s
flip runReaderT s $ m `runContT` \v -> do
s <- ask
liftIO $ writeIORef s old
k v
  • We store the previous exception handler as old, and insert a new one. After the code has finished (we have left the catchM block) we restore the old exception handler. In effect, we have a stack of exception handlers.
  • When running the handler we pop off the current exception handler by restoring old, then since we have already used up our catch, we add a new catch to catch exceptions in the handler.

We then define captureM as:

captureM :: ((a -> IO ()) -> IO ()) -> M a
captureM f = ContT $ \k -> ReaderT $ \s -> do
old <- readIORef s
writeIORef s throwIO
f $ \x -> do
s <- newIORef old
flip runReaderT s (k x) `E.catch`
\e -> ($ e) =<< readIORef s
writeIORef s throwIO
  • We make sure to switch the IORef back to throwIO before we start running the users code, and after we have finished running our code and switch back to user code. As a result, if the function that captures the continuation throws an exception, it will be raised as normal.
  • When running the continuation we create a new IORef for the handler, since the continuation might be called twice in parallel, and the separate IORef ensures they don't conflict with each other.

Finally, we need a way to run the computation. I've called that runM:

runM :: M a -> (Either SomeException a -> IO ()) -> IO ()
runM m k = do
let mm = do
captureM $ \k -> k ()
catchM (Right <$> m) (return . Left)
s <- newIORef throwIO
mm `runContT` (liftIO . k) `runReaderT` s

The signature of runM ends up being the only signature the makes sense given the underlying mechanisms. We define mm by using the facilities of captureM to insert a catch and catchM to ensure we never end up in an exception state from runM. The rest is just matching up the types.

Stack depth could potentially become a problem with this solution. If you regularly do:

captureM (\k -> k ())

Then each time a catch will be wrapped around the function. You can avoid that by changing captureM to throw an exception:

captureM :: ((a -> IO ()) -> IO ()) -> M a
captureM f = ContT $ \k -> ReaderT $ \s -> do
old <- readIORef s
writeIORef s $ \_ ->
f $ \x -> do
s <- newIORef old
flip runReaderT s (k x) `E.catch`
\e -> ($ e) =<< readIORef s
throwIO anyException

Here we unwind the catch by doing a throwIO, after installing our exception handler which actually passes the continuation. It is a bit ugly, and I haven't checked if either the catch is a problem, or that this solution solves it.

The implementation in Shake is a bit different to that described above. In Shake I know that captured continuations are never called more than once, so I
can avoid creating a new IORef in captureM, and I can reuse the existing one. Since I never change the handler, I can use a slightly less powerful definition of M:

type M a = ReaderT (IORef (SomeException -> IO ())) (ContT () IO) a

The resulting code is Development.Shake.Monad, which implements the RAW monad, and also does a few extra things which are irrelevant to this post.

The cool thing about Haskell is that I've been able to completely replace the underlying Shake Action monad from StateT/IO, to ReaderT/IO, to ReaderT/ContT/IO, without ever breaking any users of Shake. Haskell allows me to produce effective and flexible abstractions.

Categories: Offsite Blogs

ghc-mod v5.0.0

haskell-cafe - Wed, 08/20/2014 - 8:29am
Hi cafe, We are happy to announce that ghc-mod v5.0.0 has been released. Highlights of this version is as follows: - ghc-mod v4.x.y is suffering from the space leak of GHC 7.8.x(*1). So, ghc-modi consumes huge memory. ghc-modi v5.0.0 consumes much less memory thanks to workaround. - Alejandro Serrano merges his results of Google Summer of Code. He implements case splitting and sophisticated type hole handling. I believe that he will write an article on his results but I also updated the manual of Emacs fronted. (*2) - Daniel Gröber brushed up the internal code by bringing the GhcMod monad. So, the API of ghc-mod drastically changed. If you are users of the ghc-mod library, you should check its document. The main purpose of this release is to show the results of GSoC to everyone. This is a snapshot release and its API would change in the future. I thank two guys above since they worked hard and did good jobs. We hope that everyone enjoys v5.0.0. (*1) https://ghc.haskell.org/trac/ghc/tick
Categories: Offsite Discussion

Does GHC compare pointers when eval'ing (==)

haskell-cafe - Wed, 08/20/2014 - 7:52am
Comparing two structures for equality (structurally) can be expensive. But if their references are the same they would for sure be equal (unless (==) was defined in some funny way). Does GHC perform any such optimization? (Likely a question for another list but I make a stab here first. ) _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Categories: Offsite Discussion

Safer Haskell Install

Haskell on Reddit - Wed, 08/20/2014 - 12:40am
Categories: Incoming News

Locking of threads in one OS thread

haskell-cafe - Tue, 08/19/2014 - 10:36pm
Hello Cafe, I'm using FFI to interact with a library which calls, when fail, leave the reason in some kind of "errno"-like variable which is retrived via another call. AFAIU, this is not thread-safe in Haskell even if thread-local storage is used inside the library, because Haskell uses its own thread management and the Haskell thread in the same OS thread might be switched between the actual call and the retrival of errno value. This should be somehow handled already in Haskell (errno is widely used with syscalls in Linux, for example), but the source of Foreign.C.Error suggests that this is not handled in any way at all. For example, throwErrnoIf is implemented as such: throwErrno loc = do errno <- getErrno ioError (errnoToIOError loc errno Nothing Nothing) throwErrnoIf pred loc f = do res <- f if pred res then throwErrno loc else return res So, the question is: how is it ensured that this block is "atomic" in sense that at most one Haskell thread computes this whole function at every
Categories: Offsite Discussion

Reducing CPU load when using Helm

haskell-cafe - Tue, 08/19/2014 - 7:08pm
Forwarding this to haskell-cafe as it failed before. On 18 August 2014 20:24, Kaspar Emanuel <kaspar.bumke< at >gmail.com> wrote:
Categories: Offsite Discussion

Deprecating yesod-platform

Haskell on Reddit - Tue, 08/19/2014 - 6:55pm
Categories: Incoming News

Code Review For My Library 'spice'?

Haskell on Reddit - Tue, 08/19/2014 - 4:49pm

I've just "finished" (in quotes because I'm still very heavily developing it, but it technically functional enough to use) my first published library in Haskell. Its name is spice. It's meant to provide the ability to quickly set up GLFW with Elerea.

It's hosted on hackage and GitHub (with GitHub tending to have updated code). Despite it working, I'm feeling rather insecure about the degree to which the code is optimal and idiomatic. Since this subreddit is seemingly filled with Haskell geniuses, I was wondering if any of you would be willing to take a look at my code and help me try to better both it and myself as a Haskell programmer.

Thank you so much!

submitted by crockeo
[link] [12 comments]
Categories: Incoming News

is there any way to generalize this function to all Functors

Haskell on Reddit - Tue, 08/19/2014 - 4:34pm

ideally the type signature would be (Functor f, Monad m) => f a -> (a -> m b) -> m (f b) but i have no idea how to do this or even if it's possible

shit :: (Monad m) => Maybe a -> (a -> m b) -> m (Maybe b) shit (Just a) f = f a >>= return . Just shit Nothing _ = return Nothing shit' :: Monad m => [a] -> (a -> m b) -> m [b] shit' xs f = mapM f xs shit'' :: Monad m => (x, a) -> (a -> m b) -> m (x, b) shit'' (x, a) f = do b <- f a return (x, b) submitted by Intolerable
[link] [12 comments]
Categories: Incoming News