News aggregator

Haskell Platform 2014 build problems on OS X

haskell-cafe - Wed, 08/27/2014 - 9:39pm
Hello! I just want to point to your attention this issue of Homebrew: https://github.com/Homebrew/homebrew/issues/31609 It seems that they’re forced to remove the formula for the Haskell Platform because, it seems, the build process has become extremely complicated and they can’t figure out how to automate it. Is there someone that know enough internals of the platform to help them or to give some advice? Thanks, Nicola
Categories: Offsite Discussion

conduit stream fusion

Haskell on Reddit - Wed, 08/27/2014 - 8:04pm
Categories: Incoming News

ANN: cgi 3001.2.0.0

General haskell list - Wed, 08/27/2014 - 7:32pm
Hello, I've just uploaded a new version of cgi to hackage [1]. This release supports GHC 7.8.3, 7.6.3, and 7.4.2. I'm also looking for co-maintainers or backup maintainers of cgi, please email me directly if you're interested. This release was made possible with the assistance and guidance of Carter Schonwald and was based on a patch by Alexander Vershilov. Thanks. [1] http://hackage.haskell.org/package/cgi
Categories: Incoming News

I'm a .NET developer. What can Haskell/Functional Programming teach me?

Haskell on Reddit - Wed, 08/27/2014 - 7:20pm

Hi all, I am a (primarily) .NET developer, though experienced in Smalltalk, Ruby, Java, and a whole list of other things. I like to keep a wide horizon and I've gathered that Functional Programming can teach me good things. A lot of what I do day to day involves the web, typically APIs on HTTP endpoints. I'm not about to completely jump ship to Haskell and FP, but I am certain they can teach me a thing or two and I am eager to embrace it.

I have been using F# a lot lately, but as it is "yet another" .NET language it doesn't feel like true FP. Well, it can be, but the availability of libraries from other languages that clearly aren't FP makes it just too easy to break away from FP I guess? Anyway, I digress.

I understand the syntax of Haskell just fine, I've done plenty of stuff in the GHCI and read many a tutorial. However it just isn't sinking in, I still feel like I am missing something. Like there is some beautiful unicorn of simplicity that is just waiting for me to find it/unlearn the way OO does things/etc.

So, my understanding is that Haskell is a "true" Functional Programming language/platform. I've installed Haskell Platform, and EclipseFP (I'm on Windows without access to *nix for now) and I'm just staring at my project workspace still wondering what I'm not seeing. I've thus far compiled a small web-service using Scotty; However Scotty too didn't feel like FP should feel to me. Shoe-horning the way Sinatra does it into Haskell feels odd.

Am I right to feel odd about this? What am I missing that I should be learning? What would you suggest as a project/task/whatever for me to "get" Functional Programming (and/or Haskell)?

The "No Side Effects" functions "rule": What rule of thumb percentage of the application should this be? Is 100% no side effects even possible? Even with persistence/sessions/cookies/etc?

Many thanks in advance, and I'm sure this question(s) is asked many times so I apologise for that.

submitted by Jestar342
[link] [31 comments]
Categories: Incoming News

Conduit+GHC high memory use for simple Sink

haskell-cafe - Wed, 08/27/2014 - 7:19pm
Hello Cafe, First I'd like to thank Michael Snoyman and Gabriel Gonzalez for the work they've done on the conduit and pipes stream processing libraries, and all the accompanying tutorial content. I've been having fun converting a text processing app from C to Haskell. I'm seeing unexpectedly high memory usage in a stream-processing program that uses the conduit library. I've created a example that demonstrates the problem. The program accepts gzip files as arguments, and for each one, classifies each line as either Even or Odd depending on the length, then outputs some result depending on the Sink used. For each gzip file: action :: GzipFilePath -> IO () action (GzipFilePath filePath) = do result <- runResourceT $ CB.sourceFile filePath $$ Zlib.ungzip =$ CB.lines =$ token =$ sink2 putStrLn $ show result The problem is the following Sink, which counts how many even/odd Tokens are se
Categories: Offsite Discussion

Can somebody explain me the instance Generic Int

Haskell on Reddit - Wed, 08/27/2014 - 3:26pm

Quoting from http://www.haskell.org/ghc/docs/7.6.3/html/libraries/base/src/GHC-Generics.html

-- Int data D_Int data C_Int instance Datatype D_Int where datatypeName _ = "Int" moduleName _ = "GHC.Int" instance Constructor C_Int where conName _ = "" -- JPM: I'm not sure this is the right implementation... instance Generic Int where type Rep Int = D1 D_Int (C1 C_Int (S1 NoSelector (Rec0 Int))) from x = M1 (M1 (M1 (K1 x))) to (M1 (M1 (M1 (K1 x)))) = x

Can somebody explain me what the idea between the three nested M1s is?

And why is the instance Generic Integer missing?

submitted by nh2_
[link] [8 comments]
Categories: Incoming News

[ANN] Wheb 0.3 release

haskell-cafe - Wed, 08/27/2014 - 2:26pm
Hello all, I have been working on a web framework that I use for my personal projects and I thought I would share my small progress. I started this project several months ago when I hit a big bump in difficulty going from a simple skeleton project to a more intricate project using alternative frameworks. Lately I've added three main features: WebSockets, Cache and Redis. WebSockets use the same WhebT interface as the standard HTTP handlers allowing you to share functions and middlware accross your application. Cache is a new plugin for simplistic Caching. It provides an interface to cache ByteStrings under Text keys. Like the other plugins, it is backend agnostic so you can write your own backend or use the prebuilt Redis cache (this requires you to configure your redis instance for caching). In addition to Redis acting as a Cache, you can also use it as a database and as a backend for the plugins Auth and Sessions. You can mix Redis with other plugins. For one of my projects I use the MongoDB plugin f
Categories: Offsite Discussion

Philip Wadler: Informatics Independence Referendum Debate - the result

Planet Haskell - Wed, 08/27/2014 - 7:43am

Here is the outcome of the debate announced earlier:
<embed allowfullscreen="true" allowscriptaccess="always" flashvars="config=http://emedia.is.ed.ac.uk:8080/caped/course_player.xml&amp;file=http://emedia.is.ed.ac.uk:8080/caped/caped_feed.php%3Ffeed%3DD747%26q%3DSPCL" height="360" src="http://emedia.is.ed.ac.uk:8080/JW/player.swf" width="740"></embed>
Before
Yes 19
No 25
Undecided 28

After
Yes 28
No 31
Undecided 26

(Either some people entered after the debate began, or some people began the debate unsure even whether they were undecided.)

Thank you to Alan, Mike, and the audience for a fantastic debate. The audience asked amazing questions on both sides, clearly much involved with the process.

Video of debate here.
Alan's slides here.
My slides here.

Categories: Offsite Blogs

Bill Atkins: NSNotificationCenter, Swift and blocks

Planet Haskell - Wed, 08/27/2014 - 5:21am
The conventional way to register observers with NSNotificationCenter is to use the target-action pattern. While this gets the job done, it's inherently not type-safe.

For example, the following Swift snippet will compile perfectly:

    NSNotificationCenter.defaultCenter().addObserver(self, selector: Selector("itemAdded:"),      name: MyNotificationItemAdded, object: nil)
even though at runtime it will fail unless self has a method named itemAdded that takes exactly one parameter (leaving off that last colon in the selector will turn this line into a no-op). Plus, this method gives you no way to take advantages of Swift's closures, which would allow the observer to access local variables in the method that adds the observer and would eliminate the need to create a dedicated method to handle the event.
A better way to do this is to use blocks. And NSNotificationCenter does include a block-based API:
    NSNotificationCenter.defaultCenter().addObserverForName(MyNotificationItemAdded, object: nil, queue: nil) { note in      // ...    }
This is much nicer, especially with Swift's trailing closure syntax. There are no method names to be looked up at runtime, we can refer to local variables in the method that registered the observer and we can perform small bits of logic in reaction to events without having to create and name dedicated methods.
The catch comes in resource management. It's very important that an object remove its event observers when it's deallocated, or else NSNotificationCenter will try to invoke methods on invalid pointers.
The traditional target-action method has the one advantage that we can easily handle this requirement with a single call in deinit:
  deinit {    NSNotificationCenter.defaultCenter().removeObserver(self)  }
With the block API, however, since there is no explicit target object, each call to addObserverForName returns "an opaque object to act as observer." So your observer class would need to track all of these objects and then remove them all from the notification center in deinit, which is a pain.
In fact, the hassle of having to do bookkeeping on the observer objects almost cancels out the convenience of using the block API. Frustrated by this situation, I sat down and created a simple helper class, NotificationManager:
class NotificationManager {  private var observerTokens: [AnyObject] = []
  deinit {    deregisterAll()  }
  func deregisterAll() {    for token in observerTokens {      NSNotificationCenter.defaultCenter().removeObserver(token)    }
    observerTokens = []  }
  func registerObserver(name: String!, block: (NSNotification! -> ()?)) {    let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: nil, queue: nil) {note in      block(note)      ()    }
    observerTokens.append(newToken)  }    func registerObserver(name: String!, forObject object: AnyObject!, block: (NSNotification! -> ()?)) {    let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: object, queue: nil) {note in      block(note)      ()    }        observerTokens.append(newToken)  }}
First, this simple class provides a Swift-specialized API around NSNotificationCenter.  It provides an additional convenience method without an object parameter (rarely used, in my experience) to make it easier to use trailing-closure syntax. But most importantly, it keeps track of the observer objects generated when observers are registered, and removes them when the object is deinit'd.

A client of this class can simply keep a member variable of type NotificationManager and use it to register its observers. When the parent class is deallocated, the deinit method will automatically be called on its NotificationManager member variable, and its observers will be properly disposed of:

class MyController: UIViewController {  private let notificationManager = NotificationManager()    override init() {    notificationManager.registerObserver(MyNotificationItemAdded) { note in      println("item added!")    }        super.init()  }    required init(coder: NSCoder) {    fatalError("decoding not implemented")  }}
When the MyController instance is deallocated, its NotificationManager member variable will be automatically deallocated, triggering the call to deregisterAll that will remove the dead objects from NSNotificationCenter.
In my apps, I add a notificationManager instance to my common UIViewController base class so I don't have to explicitly declare the member variable in all of my controller subclasses.

Another benefit of using my own wrapper around NSNotificationCenter is that I can add useful functionality, like group observers: an observer that's triggered when any one of a group of notifications are posted:

struct NotificationGroup {  let entries: [String]    init(_ newEntries: String...) {    entries = newEntries  }
}
extension NotificationManager {  func registerGroupObserver(group: NotificationGroup, block: (NSNotification! -> ()?)) {    for name in group.entries {      registerObserver(name, block: block)    }  }}
This can be a great way to easily set up an event handler to run when, for example, an item is changed in any way at all:
   let MyNotificationItemsChanged = NotificationGroup(      MyNotificationItemAdded,      MyNotificationItemDeleted,      MyNotificationItemMoved,      MyNotificationItemEdited    )
    notificationManager.registerGroupObserver(MyNotificationItemsChanged) { note in      // ...    }
Categories: Offsite Blogs

Ivory Language

Haskell on Reddit - Wed, 08/27/2014 - 12:54am
Categories: Incoming News

A Face Only A Mother Could Love.

Haskell on Reddit - Tue, 08/26/2014 - 8:44pm
Categories: Incoming News

Simon Michael: Creating well-behaved Hakyll blog posts

Planet Haskell - Tue, 08/26/2014 - 8:15pm

Posts in a Hakyll-powered blog need to be created with care, if you want your feed to work well with clients and aggregators. There are many things to remember:

  • If you have clones of your site, decide which one to work in and make sure it’s up to date
  • Create the file in the right place
  • Name it consistently (I use YYYY-MM-DD-url-safe-title.md)
  • In my setup, prefix it with _ if it’s a draft (I render but don’t publish those)
  • Set title, tags, and author with a metadata block
  • Set published time with metadata to get a more precise timestamp than Hakyll can guess from the filename. Include a time zone. Use the right format.
  • When moving a post from draft to published:
    • Update the published time
    • Update the file name if the title or publish date has changed
  • If changing a post after it has been published: set updated time in the metadata
  • At some point, commit it to version control and sync it to other clones

I find this makes blogging feel tedious, especially after an absence when I’ve forgotten the details. Case in point, I managed to share an ugly template post with Planet Haskell readers while working on this one.

So I’m trying out this bash shell script, maybe it will help. Adjust to suit your setup.
(updated 2014/8/27)

# add to ~/.bashrc BLOGDIR=~/src/MYSITE.com/blog # List recent blog posts. alias blog-ls="ls $BLOGDIR | tail -10" # Create a new hakyll-compatible draft blog post. # blog-new ["The Title" ["tag1, tag2" ["Author Name"]]] function blog-new() { ( TITLE=${1:-Default Title} TAGS=${2:-defaulttag1, defaulttag2} AUTHOR=${3:-Default Author Name} SLUG=${TITLE// /-} DATE=`date +"%Y-%m-%d"` TIME=`date +"%Y-%m-%d %H:%M:%S%Z"` FILE=_$DATE-$SLUG.md echo creating $BLOGDIR/$FILE cat <<EOF >>$BLOGDIR/$FILE && emacsclient -n $BLOGDIR/$FILE --- title: $TITLE tags: $TAGS author: $AUTHOR published: $TIME --- EOF ) }

An example:

$ blog-new 'Scripted Hakyll blog post creation' 'hakyll, haskell' creating _2014-05-03-Scripted-Hakyll-blog-post-creation.md (file opens in emacs, edit & save) $ make ./site build Initialising... Creating store... Creating provider... Running rules... Checking for out-of-date items Compiling updated blog/_2014-05-03-Scripted-Hakyll-blog-post-creation.md Success

See also part 2.

Categories: Offsite Blogs

Simon Michael: Well-behaved Hakyll blog posts, continued

Planet Haskell - Tue, 08/26/2014 - 8:00pm

More hakyll blog fixes:

Ugly things showing on planets

My posts were showing unwanted things on planet haskell - double heading, redundant date, tag links, and ugly disqus html. By comparing with Jasper Van der Jeugt’s blog, I found the problem: I was snapshotting content for the feed at the wrong time, after applying the post template:

>>= return . fmap demoteHeaders >>= loadAndApplyTemplate "templates/post.html" (postCtx tags) >>= saveSnapshot "content" >>= loadAndApplyTemplate "templates/default.html" defaultContext

Better:

>>= saveSnapshot "content" -- >>= return . fmap demoteHeaders >>= loadAndApplyTemplate "templates/post.html" (postCtx tags) >>= loadAndApplyTemplate "templates/default.html" defaultContext

Manual feed publishing

The main blog feed is now generated with a _ prefix, and I must manually rename it (with make feed) to make it live it on Planet Haskell. This will hopefully reduce snafus (and not create new ones).

./site.hs 95 - create ["blog.xml"] $ do + create ["_blog.xml"] $ do ./Makefile 14 +feed: _site/blog.xml + +_site/blog.xml: _site/_blog.xml + cp _site/_blog.xml _site/blog.xml +

Better HTML titles

Changed the “Joyful Systems” prefix to a suffix in the HTML page titles, making search results and browser tab names more useful.

Categories: Offsite Blogs

What is the current streaming (pipes, conduit, iteratee etc.) IO solution?

Haskell on Reddit - Tue, 08/26/2014 - 7:46pm

I'm hoping for something "highlevel" like pipes or conduit, but I have no idea how they compare. The use-case is realtime streaming data from (for example) a sensor or network packets, doing something like filling a mutable vector, or running some image processing before sending a response. Does anyone know which implementations are the fastest? Are they all sufficient, or is there some better "hand rolled" way?

submitted by dogirardo
[link] [17 comments]
Categories: Incoming News

FP Complete: IAP: conduit stream fusion

Planet Haskell - Tue, 08/26/2014 - 6:00pm

Both the changes described in this blog post, and in the previous blog post, are now merged to the master branch of conduit, and have been released to Hackage as conduit 1.2.0. That doesn't indicate stream fusion is complete (far from it!). Rather, the optimizations we have so far are valuable enough that I want them to be available immediately, and future stream fusion work is highly unlikely to introduce further breaking changes. Having the code on Hackage will hopefully also make it easier for others to participate in the discussion around this code.

Stream fusion

Last time, I talked about applying the codensity transform to speed up conduit. This greatly increases performance when performing many monadic binds. However, this does nothing to help us with speeding up the "categorical composition" of conduit, where we connect two components of a pipeline together so the output from the first flows into the second. conduit usually refers to this as fusion, but given the topic at hand (stream fusion), I think that nomenclature will become confusing. So let's stick to categorical composition, even though conduit isn't actually a category.

Duncan Coutts, Roman Leshchinskiy and Don Stewart wrote the stream fusion paper, and that technique has become integral to getting high performance in the vector and text packages. The paper is well worth the read, but for those unfamiliar with the technique, let me give a very brief summary:

  • GHC is very good at optimising non-recursive functions.
  • We express all of our streaming functions has a combination of some internal state, and a function to step over that state.
  • Stepping either indicates that the stream is complete, there's a new value and a new state, or there's a new state without a new value (this last case helps avoid recursion for a number of functions like filter).
  • A stream transformers (like map) takes a Stream as input and produces a new Stream as output.
  • The final consuming functions, like fold, are the only place where recursion happens. This allows all other components of the pipeline to be inlined, rewritten to more efficient formats, and optimized by GHC.

Let's see how this looks compared to conduit.

Data types

I'm going to slightly rename data types from stream fusion to avoid conflicts with existing conduit names. I'm also going to add an extra type parameter to represent the final return value of a stream; this is a concept that exists in conduit, but not common stream fusion.

data Step s o r = Emit s o | Skip s | Stop r data Stream m o r = forall s. Stream (s -> m (Step s o r)) (m s)

The Step datatype takes three parameters. s is the internal state used by the stream, o is the type of the stream of values it generates, and r is the final result value. The Stream datatype uses an existential to hide away that internal state. It then consists of a step function that takes a state and gives us a new Step, as well as an initial state value (which is a monadic action, for cases where we want to do some initialization when starting a stream).

Let's look at some functions to get a feel for what this programming style looks like:

enumFromToS_int :: (Integral a, Monad m) => a -> a -> Stream m a () enumFromToS_int !x0 !y = Stream step (return x0) where step x | x <= y = return $ Emit (x + 1) x | otherwise = return $ Stop ()

This function generates a stream of integral values from x0 to y. The internal state is the current value to be emitted. If the current value is less than or equal to y, we emit our current value, and update our state to be the next value. Otherwise, we stop.

We can also write a function that transforms an existing stream. mapS is likely the simplest example of this:

mapS :: Monad m => (a -> b) -> Stream m a r -> Stream m b r mapS f (Stream step ms0) = Stream step' ms0 where step' s = do res <- step s return $ case res of Stop r -> Stop r Emit s' a -> Emit s' (f a) Skip s' -> Skip s'

The trick here is to make a function from one Stream to another. We unpack the input Stream constructor to get the input step and state functions. Since mapS has no state of its own, we simply keep the input state unmodified. We then provide our modified step' function. This calls the input step function, and any time it sees an Emit, applies the user-provided f function to the emitted value.

Finally, let's consider the consumption of a stream with a strict left fold:

foldS :: Monad m => (b -> a -> b) -> b -> Stream m a () -> m b foldS f b0 (Stream step ms0) = ms0 >>= loop b0 where loop !b s = do res <- step s case res of Stop () -> return b Skip s' -> loop b s' Emit s' a -> loop (f b a) s'

We unpack the input Stream constructor again, get the initial state, and then loop. Each loop, we run the input step function.

Match and mismatch with conduit

There's a simple, straightforward conversion from a Stream to a Source:

toSource :: Monad m => Stream m a () -> Producer m a toSource (Stream step ms0) = lift ms0 >>= loop where loop s = do res <- lift $ step s case res of Stop () -> return () Skip s' -> loop s' Emit s' a -> yield a >> loop s'

We extract the state, and then loop over it, calling yield for each emitted value. And ignoring finalizers for the moment, there's even a way to convert a Source into a Stream:

fromSource :: Monad m => Source m a -> Stream m a () fromSource (ConduitM src0) = Stream step (return $ src0 Done) where step (Done ()) = return $ Stop () step (Leftover p ()) = return $ Skip p step (NeedInput _ p) = return $ Skip $ p () step (PipeM mp) = liftM Skip mp step (HaveOutput p _finalizer o) = return $ Emit p o

Unfortunately, there's no straightforward conversion for Conduits (transformers) and Sinks (consumers). There's simply a mismatch in the conduit world- which is fully continuation based- to the stream world- where the upstream is provided in an encapsulated value. I did find a few representations that mostly work, but the performance characteristics are terrible.

If anyone has insights into this that I missed, please contact me, as this could have an important impact on the future of stream fusion in conduit. But for the remainder of this blog post, I will continue under the assumption that only Source and Stream can be efficiently converted.

StreamConduit

Once I accepted that I wouldn't be able to convert a stream transformation into a conduit transformation, I was left with a simple approach to start working on fusion: have two representations of each function we want to be able to fuse. The first representation would use normal conduit code, and the second would be streaming. This looks like:

data StreamConduit i o m r = StreamConduit (ConduitM i o m r) (Stream m i () -> Stream m o r)

Notice that the second field uses the stream fusion concept of a Stream-transforming function. At first, this may seem like it doesn't properly address Sources and Sinks, since the former doesn't have an input Stream, and the latter results in a single output value, not a Stream. However, those are really just special cases of the more general form used here. For Sources, we provide an empty input stream, and for Sinks, we continue executing the Stream until we get a Stop constructor with the final result. You can see both of these in the implementation of the connectStream function (whose purpose I'll explain in a moment):

connectStream :: Monad m => StreamConduit () i m () -> StreamConduit i Void m r -> m r connectStream (StreamConduit _ stream) (StreamConduit _ f) = run $ f $ stream $ Stream emptyStep (return ()) where emptyStep _ = return $ Stop () run (Stream step ms0) = ms0 >>= loop where loop s = do res <- step s case res of Stop r -> return r Skip s' -> loop s' Emit _ o -> absurd o

Notice how we've created an empty Stream using emptyStep and a dummy () state. And on the run side, we loop through the results. The type system (via the Void datatype) prevents the possibility of a meaningful Emit constructor, and we witness this with the absurd function. For Stop we return the final value, and Skip implies another loop.

Fusing StreamConduit

Assuming we have some functions that use StreamConduit, how do we get things to fuse? We still need all of our functions to have a ConduitM type signature, so we start off with a function to convert a StreamConduit into a ConduitM:

unstream :: StreamConduit i o m r -> ConduitM i o m r unstream (StreamConduit c _) = c {-# INLINE [0] unstream #-}

Note that we hold off on any inlining until simplification phase 0. This is vital to our next few rewrite rules, which is where all the magic happens.

The next thing we want to be able to do is categorically compose two StreamConduits together. This is easy to do, since a StreamConduit is made up of ConduitMs which compose via the =$= operator, and Stream transformers, which compose via normal function composition. This results in a function:

fuseStream :: Monad m => StreamConduit a b m () -> StreamConduit b c m r -> StreamConduit a c m r fuseStream (StreamConduit a x) (StreamConduit b y) = StreamConduit (a =$= b) (y . x) {-# INLINE fuseStream #-}

That's very logical, but still not magical. The final trick is a rewrite rule:

{-# RULES "fuseStream" forall left right. unstream left =$= unstream right = unstream (fuseStream left right) #-}

We're telling GHC that, if we see a composition of two streamable conduits, then we can compose the stream versions of them and get the same result. But this isn't enough yet; unstream will still end up throwing away the stream version. We now need to deal with running these things. The first case we'll handle is connecting two streamable conduits, which is where the connectStream function from above comes into play. If you go back and look at that code, you'll see that the ConduitM fields are never used. All that's left is telling GHC to use connectStream when appropriate:

{-# RULES "connectStream" forall left right. unstream left $$ unstream right = connectStream left right #-}

The next case we'll handle is when we connect a streamable source to a non-streamable sink. This is less efficient than the previous case, since it still requires allocating ConduitM constructors, and doesn't expose as many opportunities for GHC to inline and optimize our code. However, it's still better than nothing:

connectStream1 :: Monad m => StreamConduit () i m () -> ConduitM i Void m r -> m r connectStream1 (StreamConduit _ fstream) (ConduitM sink0) = case fstream $ Stream (const $ return $ Stop ()) (return ()) of Stream step ms0 -> let loop _ (Done r) _ = return r loop ls (PipeM mp) s = mp >>= flip (loop ls) s loop ls (Leftover p l) s = loop (l:ls) p s loop _ (HaveOutput _ _ o) _ = absurd o loop (l:ls) (NeedInput p _) s = loop ls (p l) s loop [] (NeedInput p c) s = do res <- step s case res of Stop () -> loop [] (c ()) s Skip s' -> loop [] (NeedInput p c) s' Emit s' i -> loop [] (p i) s' in ms0 >>= loop [] (sink0 Done) {-# INLINE connectStream1 #-} {-# RULES "connectStream1" forall left right. unstream left $$ right = connectStream1 left right #-}

There's a third case that's worth considering: a streamable sink and non-streamable source. However, I ran into two problems when implementing such a rewrite rule:

  • GHC did not end up firing the rule.

  • There are some corner cases regarding finalizers that need to be dealt with. In our previous examples, the upstream was always a stream, which has no concept of finalizers. But when the upstream is a conduit, we need to make sure to call them appropriately.

So for now, fusion only works for cases where all of the functions can by fused, or all of the functions before the $$ operator can be fused. Otherwise, we'll revert to the normal performance of conduit code.

Benchmarks

I took the benchmarks from our previous blog post and modified them slightly. The biggest addition was including an example of enumFromTo =$= map =$= map =$= fold, which really stresses out the fusion capabilities, and demonstrates the performance gap stream fusion offers.

The other thing to note is that, in the "before fusion" benchmarks, the sum results are skewed by the fact that we have the overly eager rewrite rules for enumFromTo $$ fold (for more information, see the previous blog post). For the "after fusion" benchmarks, there are no special-case rewrite rules in place. Instead, the results you're seeing are actual artifacts of having a proper fusion framework in place. In other words, you can expect this to translate into real-world speedups.

You can compare before fusion and after fusion. Let me provide a few select comparisons:

Benchmark Low level or vector Before fusion After fusion Speedup map + sum 5.95us 636us 5.96us 99% monte carlo 3.45ms 5.34ms 3.70ms 71% sliding window size 10, Seq 1.53ms 1.89ms 1.53ms 21% sliding vector size 10, unboxed 2.25ms 4.05ms 2.33ms 42%

Note at the map + sum benchmark is very extreme, since the inner loop is doing very cheap work, so the conduit overhead dominated the analysis.

Streamifying a conduit

Here's an example of making a conduit function stream fusion-compliant, using the map function:

mapC :: Monad m => (a -> b) -> Conduit a m b mapC f = awaitForever $ yield . f {-# INLINE mapC #-} mapS :: Monad m => (a -> b) -> Stream m a r -> Stream m b r mapS f (Stream step ms0) = Stream step' ms0 where step' s = do res <- step s return $ case res of Stop r -> Stop r Emit s' a -> Emit s' (f a) Skip s' -> Skip s' {-# INLINE mapS #-} map :: Monad m => (a -> b) -> Conduit a m b map = mapC {-# INLINE [0] map #-} {-# RULES "unstream map" forall f. map f = unstream (StreamConduit (mapC f) (mapS f)) #-}

Notice the three steps here:

  • Define a pure-conduit implementation (mapC), which looks just like conduit 1.1's map function.
  • Define a pure-stream implementation (mapS), which looks very similar to vector's mapS.
  • Define map, which by default simply reexposes mapC. But then, use an INLINE statement to delay inlining until simplification phase 0, and use a rewrite rule to rewrite map in terms of unstream and our two helper functions mapC and mapS.

While tedious, this is all we need to do for each function to expose it to the fusion framework.

Vector vs conduit, mapM style

Overall, vector has been both the inspiration for the work I've done here, and the bar I've used to compare against, since it is generally the fastest implementation you can get in Haskell (and tends to be high-level code to boot). However, there seems to be one workflow where conduit drastically outperforms vector: chaining together monadic transformations.

I put together a benchmark which does the same enumFromTo+map+sum benchmark I demonstrated previously. But this time, I have four versions: vector with pure functions, vector with IO functions, conduit with pure functions, and conduit with IO functions. You can see the results here, the important takeaway is:

  • Pure is always faster, since it exposes more optimizations to GHC.
  • vector and conduit pure are almost identical, at 57.7us and 58.1us.
  • Monadic conduit code does have a slowdown (86.3us). However, monadic vector code has a drastic slowdown (305us), presumably because monadic binds defeat its fusion framework.

So there seems to be at least one workflow for which conduit's fusion framework can outperform even vector!

Downsides

The biggest downside to this implementation of stream fusion is that we need to write all of our algorithms twice. This can possibly be mitigated by having a few helper functions in place, and implementing others in terms of those. For example, mapM_ can be implemented in terms foldM.

There's one exception to this: using the streamSource function, we can convert a Stream into a Source without having to write our algorithm twice. However, due to differences in how monadic actions are performed between Stream and Conduit, this could introduce a performance degredation for pure Sources. We can work around that with a special case function streamSourcePure for the Identity monad as a base.

Getting good performance

In order to take advantage of the new stream fusion framework, try to follow these guidelines:

  • Use fusion functions whenever possible. Explicit usage of await and yield will immediately kick you back to non-fusion (the same as explicit pattern matching defeats list fusion).
  • If you absolutely cannot use an existing fusion function, consider writing your own fusion variant.
  • When mixing fusion and non-fusion, put as many fusion functions as possible together with the $= operator before the connect operator $$.
Next steps

Even though this work is now publicly available on Hackage, there's still a lot of work to be done. This falls into three main categories:

  • Continue rewriting core library functions in streaming style. Michael Sloan has been working on a lot of these functions, and we're hoping to have almost all the combinators from Data.Conduit.List and Data.Conduit.Combinators done soon.
  • Research why rewrite rules and inlining don't play nicely together. In a number of places, we've had to explicitly use rewrite rules to force fusion to happen, when theoretically inlining should have taken care of it for us.
  • Look into any possible alternative formulations of stream fusion that provide better code reuse or more reliable rewrite rule firing.

Community assistance on all three points, but especially 2 and 3, are much appreciated!

Categories: Offsite Blogs

What do you think about hash-consing every data-structure?

Haskell on Reddit - Tue, 08/26/2014 - 5:52pm

That is, making the whole memory shared so no 2 expressions are ever duplicated in memory. Also, cross-program, pervasive, automatic memoization is interesting, and O(1) equality testing sounds useful. The huge memory savings could make it much more efficient to apply automatic synchronisation betwen server/client. I see no blatant cons at this point... what do you think?

submitted by SrPeixinho
[link] [9 comments]
Categories: Incoming News

Lazy IO Problem

Haskell on Reddit - Tue, 08/26/2014 - 5:36pm

I know this is going to have a very straightforward explanation, but I can't quite wrap my head around it at the moment.

Suppose I have a file located at "/dummy.txt" with the contents

# text in the file

Now consider this set of functions:

-- io.hs import System.IO something :: String -> IO String something filename = do putStrLn "START SOMETHING" withFile filename ReadMode (\h -> do putStrLn "INSIDE LAMDA" s <- hGetContents h return s) something' :: String -> IO String something' filename = do putStrLn "START SOMETHING" withFile filename ReadMode (\h -> do putStrLn "INSIDE LAMDA" s <- hGetContents h putStrLn s return s) other :: String -> (String -> IO String) -> IO () other s f = do putStrLn "START OTHER" contents <- f s putStrLn "AFTER SOMETHING" putStr contents

Now in ghci I run the following:

Prelude> :l io *Main> other "/dummy.txt" something START OTHER START SOMETHING INSIDE LAMDA AFTER SOMETHING *Main> other "/dummy.txt" something' START OTHER START SOMETHING INSIDE LAMDA # text in the file AFTER SOMETHING # text in the file

The IO action returned by something is empty if I do not evaluate anything that prints the contents to the screen inside the function passed to withFile. If I do evaluate something such as putStrLn contents like something' does, then the IO action that is returned contains the contents of the file. Why?

(Also, "you shouldn't be doing this in the first place" is not really helpful in case the urge strikes you to say how this should be written differently)

submitted by wolfgangfabian
[link] [4 comments]
Categories: Incoming News

Category theory for scientists, answers.

Haskell on Reddit - Tue, 08/26/2014 - 4:09pm

I'm reading Spivak's Category theory for scientists. I've been answering all of the questions but I would like to know if I am doing anything right. I can't find an answer book online. Does anyone know of one, or at least partial answers? I don't want to look ahead, I just want to know if I am doing anything right.

submitted by yyttr3
[link] [5 comments]
Categories: Incoming News