News aggregator

Simon Michael: Well-behaved Hakyll blog posts, continued

Planet Haskell - Tue, 08/26/2014 - 8:00pm

More hakyll blog fixes:

Ugly things showing on planets

My posts were showing unwanted things on planet haskell - double heading, redundant date, tag links, and ugly disqus html. By comparing with Jasper Van der Jeugt’s blog, I found the problem: I was snapshotting content for the feed at the wrong time, after applying the post template:

>>= return . fmap demoteHeaders >>= loadAndApplyTemplate "templates/post.html" (postCtx tags) >>= saveSnapshot "content" >>= loadAndApplyTemplate "templates/default.html" defaultContext

Better:

>>= saveSnapshot "content" -- >>= return . fmap demoteHeaders >>= loadAndApplyTemplate "templates/post.html" (postCtx tags) >>= loadAndApplyTemplate "templates/default.html" defaultContext

Manual feed publishing

The main blog feed is now generated with a _ prefix, and I must manually rename it (with make feed) to make it live it on Planet Haskell. This will hopefully reduce snafus (and not create new ones).

./site.hs 95 - create ["blog.xml"] $ do + create ["_blog.xml"] $ do ./Makefile 14 +feed: _site/blog.xml + +_site/blog.xml: _site/_blog.xml + cp _site/_blog.xml _site/blog.xml +

Better HTML titles

Changed the “Joyful Systems” prefix to a suffix in the HTML page titles, making search results and browser tab names more useful.

Categories: Offsite Blogs

What is the current streaming (pipes, conduit, iteratee etc.) IO solution?

Haskell on Reddit - Tue, 08/26/2014 - 7:46pm

I'm hoping for something "highlevel" like pipes or conduit, but I have no idea how they compare. The use-case is realtime streaming data from (for example) a sensor or network packets, doing something like filling a mutable vector, or running some image processing before sending a response. Does anyone know which implementations are the fastest? Are they all sufficient, or is there some better "hand rolled" way?

submitted by dogirardo
[link] [17 comments]
Categories: Incoming News

FP Complete: IAP: conduit stream fusion

Planet Haskell - Tue, 08/26/2014 - 6:00pm

Both the changes described in this blog post, and in the previous blog post, are now merged to the master branch of conduit, and have been released to Hackage as conduit 1.2.0. That doesn't indicate stream fusion is complete (far from it!). Rather, the optimizations we have so far are valuable enough that I want them to be available immediately, and future stream fusion work is highly unlikely to introduce further breaking changes. Having the code on Hackage will hopefully also make it easier for others to participate in the discussion around this code.

Stream fusion

Last time, I talked about applying the codensity transform to speed up conduit. This greatly increases performance when performing many monadic binds. However, this does nothing to help us with speeding up the "categorical composition" of conduit, where we connect two components of a pipeline together so the output from the first flows into the second. conduit usually refers to this as fusion, but given the topic at hand (stream fusion), I think that nomenclature will become confusing. So let's stick to categorical composition, even though conduit isn't actually a category.

Duncan Coutts, Roman Leshchinskiy and Don Stewart wrote the stream fusion paper, and that technique has become integral to getting high performance in the vector and text packages. The paper is well worth the read, but for those unfamiliar with the technique, let me give a very brief summary:

  • GHC is very good at optimising non-recursive functions.
  • We express all of our streaming functions has a combination of some internal state, and a function to step over that state.
  • Stepping either indicates that the stream is complete, there's a new value and a new state, or there's a new state without a new value (this last case helps avoid recursion for a number of functions like filter).
  • A stream transformers (like map) takes a Stream as input and produces a new Stream as output.
  • The final consuming functions, like fold, are the only place where recursion happens. This allows all other components of the pipeline to be inlined, rewritten to more efficient formats, and optimized by GHC.

Let's see how this looks compared to conduit.

Data types

I'm going to slightly rename data types from stream fusion to avoid conflicts with existing conduit names. I'm also going to add an extra type parameter to represent the final return value of a stream; this is a concept that exists in conduit, but not common stream fusion.

data Step s o r = Emit s o | Skip s | Stop r data Stream m o r = forall s. Stream (s -> m (Step s o r)) (m s)

The Step datatype takes three parameters. s is the internal state used by the stream, o is the type of the stream of values it generates, and r is the final result value. The Stream datatype uses an existential to hide away that internal state. It then consists of a step function that takes a state and gives us a new Step, as well as an initial state value (which is a monadic action, for cases where we want to do some initialization when starting a stream).

Let's look at some functions to get a feel for what this programming style looks like:

enumFromToS_int :: (Integral a, Monad m) => a -> a -> Stream m a () enumFromToS_int !x0 !y = Stream step (return x0) where step x | x <= y = return $ Emit (x + 1) x | otherwise = return $ Stop ()

This function generates a stream of integral values from x0 to y. The internal state is the current value to be emitted. If the current value is less than or equal to y, we emit our current value, and update our state to be the next value. Otherwise, we stop.

We can also write a function that transforms an existing stream. mapS is likely the simplest example of this:

mapS :: Monad m => (a -> b) -> Stream m a r -> Stream m b r mapS f (Stream step ms0) = Stream step' ms0 where step' s = do res <- step s return $ case res of Stop r -> Stop r Emit s' a -> Emit s' (f a) Skip s' -> Skip s'

The trick here is to make a function from one Stream to another. We unpack the input Stream constructor to get the input step and state functions. Since mapS has no state of its own, we simply keep the input state unmodified. We then provide our modified step' function. This calls the input step function, and any time it sees an Emit, applies the user-provided f function to the emitted value.

Finally, let's consider the consumption of a stream with a strict left fold:

foldS :: Monad m => (b -> a -> b) -> b -> Stream m a () -> m b foldS f b0 (Stream step ms0) = ms0 >>= loop b0 where loop !b s = do res <- step s case res of Stop () -> return b Skip s' -> loop b s' Emit s' a -> loop (f b a) s'

We unpack the input Stream constructor again, get the initial state, and then loop. Each loop, we run the input step function.

Match and mismatch with conduit

There's a simple, straightforward conversion from a Stream to a Source:

toSource :: Monad m => Stream m a () -> Producer m a toSource (Stream step ms0) = lift ms0 >>= loop where loop s = do res <- lift $ step s case res of Stop () -> return () Skip s' -> loop s' Emit s' a -> yield a >> loop s'

We extract the state, and then loop over it, calling yield for each emitted value. And ignoring finalizers for the moment, there's even a way to convert a Source into a Stream:

fromSource :: Monad m => Source m a -> Stream m a () fromSource (ConduitM src0) = Stream step (return $ src0 Done) where step (Done ()) = return $ Stop () step (Leftover p ()) = return $ Skip p step (NeedInput _ p) = return $ Skip $ p () step (PipeM mp) = liftM Skip mp step (HaveOutput p _finalizer o) = return $ Emit p o

Unfortunately, there's no straightforward conversion for Conduits (transformers) and Sinks (consumers). There's simply a mismatch in the conduit world- which is fully continuation based- to the stream world- where the upstream is provided in an encapsulated value. I did find a few representations that mostly work, but the performance characteristics are terrible.

If anyone has insights into this that I missed, please contact me, as this could have an important impact on the future of stream fusion in conduit. But for the remainder of this blog post, I will continue under the assumption that only Source and Stream can be efficiently converted.

StreamConduit

Once I accepted that I wouldn't be able to convert a stream transformation into a conduit transformation, I was left with a simple approach to start working on fusion: have two representations of each function we want to be able to fuse. The first representation would use normal conduit code, and the second would be streaming. This looks like:

data StreamConduit i o m r = StreamConduit (ConduitM i o m r) (Stream m i () -> Stream m o r)

Notice that the second field uses the stream fusion concept of a Stream-transforming function. At first, this may seem like it doesn't properly address Sources and Sinks, since the former doesn't have an input Stream, and the latter results in a single output value, not a Stream. However, those are really just special cases of the more general form used here. For Sources, we provide an empty input stream, and for Sinks, we continue executing the Stream until we get a Stop constructor with the final result. You can see both of these in the implementation of the connectStream function (whose purpose I'll explain in a moment):

connectStream :: Monad m => StreamConduit () i m () -> StreamConduit i Void m r -> m r connectStream (StreamConduit _ stream) (StreamConduit _ f) = run $ f $ stream $ Stream emptyStep (return ()) where emptyStep _ = return $ Stop () run (Stream step ms0) = ms0 >>= loop where loop s = do res <- step s case res of Stop r -> return r Skip s' -> loop s' Emit _ o -> absurd o

Notice how we've created an empty Stream using emptyStep and a dummy () state. And on the run side, we loop through the results. The type system (via the Void datatype) prevents the possibility of a meaningful Emit constructor, and we witness this with the absurd function. For Stop we return the final value, and Skip implies another loop.

Fusing StreamConduit

Assuming we have some functions that use StreamConduit, how do we get things to fuse? We still need all of our functions to have a ConduitM type signature, so we start off with a function to convert a StreamConduit into a ConduitM:

unstream :: StreamConduit i o m r -> ConduitM i o m r unstream (StreamConduit c _) = c {-# INLINE [0] unstream #-}

Note that we hold off on any inlining until simplification phase 0. This is vital to our next few rewrite rules, which is where all the magic happens.

The next thing we want to be able to do is categorically compose two StreamConduits together. This is easy to do, since a StreamConduit is made up of ConduitMs which compose via the =$= operator, and Stream transformers, which compose via normal function composition. This results in a function:

fuseStream :: Monad m => StreamConduit a b m () -> StreamConduit b c m r -> StreamConduit a c m r fuseStream (StreamConduit a x) (StreamConduit b y) = StreamConduit (a =$= b) (y . x) {-# INLINE fuseStream #-}

That's very logical, but still not magical. The final trick is a rewrite rule:

{-# RULES "fuseStream" forall left right. unstream left =$= unstream right = unstream (fuseStream left right) #-}

We're telling GHC that, if we see a composition of two streamable conduits, then we can compose the stream versions of them and get the same result. But this isn't enough yet; unstream will still end up throwing away the stream version. We now need to deal with running these things. The first case we'll handle is connecting two streamable conduits, which is where the connectStream function from above comes into play. If you go back and look at that code, you'll see that the ConduitM fields are never used. All that's left is telling GHC to use connectStream when appropriate:

{-# RULES "connectStream" forall left right. unstream left $$ unstream right = connectStream left right #-}

The next case we'll handle is when we connect a streamable source to a non-streamable sink. This is less efficient than the previous case, since it still requires allocating ConduitM constructors, and doesn't expose as many opportunities for GHC to inline and optimize our code. However, it's still better than nothing:

connectStream1 :: Monad m => StreamConduit () i m () -> ConduitM i Void m r -> m r connectStream1 (StreamConduit _ fstream) (ConduitM sink0) = case fstream $ Stream (const $ return $ Stop ()) (return ()) of Stream step ms0 -> let loop _ (Done r) _ = return r loop ls (PipeM mp) s = mp >>= flip (loop ls) s loop ls (Leftover p l) s = loop (l:ls) p s loop _ (HaveOutput _ _ o) _ = absurd o loop (l:ls) (NeedInput p _) s = loop ls (p l) s loop [] (NeedInput p c) s = do res <- step s case res of Stop () -> loop [] (c ()) s Skip s' -> loop [] (NeedInput p c) s' Emit s' i -> loop [] (p i) s' in ms0 >>= loop [] (sink0 Done) {-# INLINE connectStream1 #-} {-# RULES "connectStream1" forall left right. unstream left $$ right = connectStream1 left right #-}

There's a third case that's worth considering: a streamable sink and non-streamable source. However, I ran into two problems when implementing such a rewrite rule:

  • GHC did not end up firing the rule.

  • There are some corner cases regarding finalizers that need to be dealt with. In our previous examples, the upstream was always a stream, which has no concept of finalizers. But when the upstream is a conduit, we need to make sure to call them appropriately.

So for now, fusion only works for cases where all of the functions can by fused, or all of the functions before the $$ operator can be fused. Otherwise, we'll revert to the normal performance of conduit code.

Benchmarks

I took the benchmarks from our previous blog post and modified them slightly. The biggest addition was including an example of enumFromTo =$= map =$= map =$= fold, which really stresses out the fusion capabilities, and demonstrates the performance gap stream fusion offers.

The other thing to note is that, in the "before fusion" benchmarks, the sum results are skewed by the fact that we have the overly eager rewrite rules for enumFromTo $$ fold (for more information, see the previous blog post). For the "after fusion" benchmarks, there are no special-case rewrite rules in place. Instead, the results you're seeing are actual artifacts of having a proper fusion framework in place. In other words, you can expect this to translate into real-world speedups.

You can compare before fusion and after fusion. Let me provide a few select comparisons:

Benchmark Low level or vector Before fusion After fusion Speedup map + sum 5.95us 636us 5.96us 99% monte carlo 3.45ms 5.34ms 3.70ms 71% sliding window size 10, Seq 1.53ms 1.89ms 1.53ms 21% sliding vector size 10, unboxed 2.25ms 4.05ms 2.33ms 42%

Note at the map + sum benchmark is very extreme, since the inner loop is doing very cheap work, so the conduit overhead dominated the analysis.

Streamifying a conduit

Here's an example of making a conduit function stream fusion-compliant, using the map function:

mapC :: Monad m => (a -> b) -> Conduit a m b mapC f = awaitForever $ yield . f {-# INLINE mapC #-} mapS :: Monad m => (a -> b) -> Stream m a r -> Stream m b r mapS f (Stream step ms0) = Stream step' ms0 where step' s = do res <- step s return $ case res of Stop r -> Stop r Emit s' a -> Emit s' (f a) Skip s' -> Skip s' {-# INLINE mapS #-} map :: Monad m => (a -> b) -> Conduit a m b map = mapC {-# INLINE [0] map #-} {-# RULES "unstream map" forall f. map f = unstream (StreamConduit (mapC f) (mapS f)) #-}

Notice the three steps here:

  • Define a pure-conduit implementation (mapC), which looks just like conduit 1.1's map function.
  • Define a pure-stream implementation (mapS), which looks very similar to vector's mapS.
  • Define map, which by default simply reexposes mapC. But then, use an INLINE statement to delay inlining until simplification phase 0, and use a rewrite rule to rewrite map in terms of unstream and our two helper functions mapC and mapS.

While tedious, this is all we need to do for each function to expose it to the fusion framework.

Vector vs conduit, mapM style

Overall, vector has been both the inspiration for the work I've done here, and the bar I've used to compare against, since it is generally the fastest implementation you can get in Haskell (and tends to be high-level code to boot). However, there seems to be one workflow where conduit drastically outperforms vector: chaining together monadic transformations.

I put together a benchmark which does the same enumFromTo+map+sum benchmark I demonstrated previously. But this time, I have four versions: vector with pure functions, vector with IO functions, conduit with pure functions, and conduit with IO functions. You can see the results here, the important takeaway is:

  • Pure is always faster, since it exposes more optimizations to GHC.
  • vector and conduit pure are almost identical, at 57.7us and 58.1us.
  • Monadic conduit code does have a slowdown (86.3us). However, monadic vector code has a drastic slowdown (305us), presumably because monadic binds defeat its fusion framework.

So there seems to be at least one workflow for which conduit's fusion framework can outperform even vector!

Downsides

The biggest downside to this implementation of stream fusion is that we need to write all of our algorithms twice. This can possibly be mitigated by having a few helper functions in place, and implementing others in terms of those. For example, mapM_ can be implemented in terms foldM.

There's one exception to this: using the streamSource function, we can convert a Stream into a Source without having to write our algorithm twice. However, due to differences in how monadic actions are performed between Stream and Conduit, this could introduce a performance degredation for pure Sources. We can work around that with a special case function streamSourcePure for the Identity monad as a base.

Getting good performance

In order to take advantage of the new stream fusion framework, try to follow these guidelines:

  • Use fusion functions whenever possible. Explicit usage of await and yield will immediately kick you back to non-fusion (the same as explicit pattern matching defeats list fusion).
  • If you absolutely cannot use an existing fusion function, consider writing your own fusion variant.
  • When mixing fusion and non-fusion, put as many fusion functions as possible together with the $= operator before the connect operator $$.
Next steps

Even though this work is now publicly available on Hackage, there's still a lot of work to be done. This falls into three main categories:

  • Continue rewriting core library functions in streaming style. Michael Sloan has been working on a lot of these functions, and we're hoping to have almost all the combinators from Data.Conduit.List and Data.Conduit.Combinators done soon.
  • Research why rewrite rules and inlining don't play nicely together. In a number of places, we've had to explicitly use rewrite rules to force fusion to happen, when theoretically inlining should have taken care of it for us.
  • Look into any possible alternative formulations of stream fusion that provide better code reuse or more reliable rewrite rule firing.

Community assistance on all three points, but especially 2 and 3, are much appreciated!

Categories: Offsite Blogs

What do you think about hash-consing every data-structure?

Haskell on Reddit - Tue, 08/26/2014 - 5:52pm

That is, making the whole memory shared so no 2 expressions are ever duplicated in memory. Also, cross-program, pervasive, automatic memoization is interesting, and O(1) equality testing sounds useful. The huge memory savings could make it much more efficient to apply automatic synchronisation betwen server/client. I see no blatant cons at this point... what do you think?

submitted by SrPeixinho
[link] [9 comments]
Categories: Incoming News

Lazy IO Problem

Haskell on Reddit - Tue, 08/26/2014 - 5:36pm

I know this is going to have a very straightforward explanation, but I can't quite wrap my head around it at the moment.

Suppose I have a file located at "/dummy.txt" with the contents

# text in the file

Now consider this set of functions:

-- io.hs import System.IO something :: String -> IO String something filename = do putStrLn "START SOMETHING" withFile filename ReadMode (\h -> do putStrLn "INSIDE LAMDA" s <- hGetContents h return s) something' :: String -> IO String something' filename = do putStrLn "START SOMETHING" withFile filename ReadMode (\h -> do putStrLn "INSIDE LAMDA" s <- hGetContents h putStrLn s return s) other :: String -> (String -> IO String) -> IO () other s f = do putStrLn "START OTHER" contents <- f s putStrLn "AFTER SOMETHING" putStr contents

Now in ghci I run the following:

Prelude> :l io *Main> other "/dummy.txt" something START OTHER START SOMETHING INSIDE LAMDA AFTER SOMETHING *Main> other "/dummy.txt" something' START OTHER START SOMETHING INSIDE LAMDA # text in the file AFTER SOMETHING # text in the file

The IO action returned by something is empty if I do not evaluate anything that prints the contents to the screen inside the function passed to withFile. If I do evaluate something such as putStrLn contents like something' does, then the IO action that is returned contains the contents of the file. Why?

(Also, "you shouldn't be doing this in the first place" is not really helpful in case the urge strikes you to say how this should be written differently)

submitted by wolfgangfabian
[link] [4 comments]
Categories: Incoming News

Category theory for scientists, answers.

Haskell on Reddit - Tue, 08/26/2014 - 4:09pm

I'm reading Spivak's Category theory for scientists. I've been answering all of the questions but I would like to know if I am doing anything right. I can't find an answer book online. Does anyone know of one, or at least partial answers? I don't want to look ahead, I just want to know if I am doing anything right.

submitted by yyttr3
[link] [5 comments]
Categories: Incoming News

Edward Z. Yang: A taste of Cabalized Backpack

Planet Haskell - Tue, 08/26/2014 - 4:01pm

So perhaps you've bought into modules and modularity and want to get to using Backpack straightaway. How can you do it? In this blog post, I want to give a tutorial-style taste of how to program Cabal in the Backpack style. None of these examples are executable, because only some of this system is in GHC HEAD--the rest are on branches awaiting code review or complete vaporware. However, we've got a pretty good idea how the overall design and user experience should go, and so the purpose of this blog post is to communicate that idea. Comments and suggestions would be much appreciated; while the design here is theoretically well-founded, for obvious reasons, we don't have much on-the-ground programmer feedback yet.

A simple package in today's Cabal

To start, let's briefly review how Haskell modules and Cabal packages work today. Our running example will be the bytestring package, although I'll inline, simplify and omit definitions to enhance clarity.

Let's suppose that you are writing a library, and you want to use efficient, packed strings for some binary processing you are doing. Fortunately for you, the venerable Don Stewart has already written a bytestring package which implements this functionality for you. This package consists of a few modules: an implementation of strict ByteStrings...

module Data.ByteString(ByteString, empty, singleton, ...) where data ByteString = PS !(ForeignPtr Word8) !Int !Int empty :: ByteString empty = PS nullForeignPtr 0 0 ...

...and an implementation of lazy ByteStrings:

module Data.ByteString.Lazy(ByteString, empty, singleton, ...) where data ByteString = Empty | Chunk !S.ByteString ByteString empty :: ByteString empty = Empty ...

These modules are packaged up into a package which is specified using a Cabal file (for now, we'll ignore the ability to define libraries/executables in the same Cabal file and assume everything is in a library):

name: bytestring version: 0.10.4.0 build-depends: base >= 4.2 && < 5, ghc-prim, deepseq exposed-modules: Data.ByteString, Data.ByteString.Lazy, ... other-modules: ...

We can then make a simple module and package which depends on the bytestring package:

module Utils where import Data.ByteString.Lazy as B blank :: IO () blank = B.putStr B.empty name: utilities version: 0.1 build-depends: base, bytestring >= 0.10 exposed-modules: Utils

It's worth noting a few things about this completely standard module setup:

  1. It's not possible to switch Utils from using lazy ByteStrings to strict ByteStrings without literally editing the Utils module. And even if you do that, you can't have Utils depending on strict ByteString, and Utils depending on lazy ByteString, in the same program, without copying the entire module text. (This is not too surprising, since the code really is different.)
  2. Nevertheless, there is some amount of indirection here: while Utils includes a specific ByteString module, it is unspecified which version of ByteString it will be. If (hypothetically) the bytestring library released a new version where lazy byte-strings were actually strict, the functionality of Utils would change accordingly when the user re-ran dependency resolution.
  3. I used a qualified import to refer to identifiers in Data.ByteString.Lazy. This is a pretty common pattern when developing Haskell code: we think of B as an alias to the actual model. Textually, this is also helpful, because it means I only have to edit the import statement to change which ByteString I refer to.
Generalizing Utils with a signature

To generalize Utils with some Backpack magic, we need to create a signature for ByteString, which specifies what the interface of the module providing ByteStrings is. Here one such signature, which is placed in the file Data/ByteString.hsig inside the utilities package:

module Data.ByteString where import Data.Word data ByteString instance Eq ByteString empty :: ByteString singleton :: Word8 -> ByteString putStr :: ByteString -> IO ()

The format of a signature is essentially the same of that of an hs-boot file: we have normal Haskell declarations, but omitting the actual implementations of values.

The utilities package now needs a new field to record signatures:

name: utilities indefinite: True build-depends: base exposed-modules: Utils required-signatures: Data.ByteString

Notice that there have been three changes: (1) We've removed the direct dependency on the bytestring package, (2) we've added a new field indefinite, which indicates that this indefinite package has signatures and cannot be compiled until those signatures are filled in with implementations (this field is strictly redundant, but is useful for documentation purposes, as we will see later), and (3) we have a new field required-signatures which simply lists the names of the signature files (also known as holes) that we need filled in.

How do we actually use the utilities package, then? Let's suppose our goal is to produce a new module, Utils.Strict, which is Utils but using strict ByteStrings (which is exported by the bytestring package under the module name Data.ByteString). To do this, we'll need to create a new package:

name: strict-utilities build-depends: utilities, bytestring reexported-modules: Utils as Utils.Strict

That's it! strict-utilities exports a single module Utils.Strict which is utilities using Data.ByteString from bytestring (which is the strict implementation). This is called a mix-in: in the same dependency list, we simply mix together:

  • utilities, which requires a module named Data.ByteString, and
  • bytestring, which supplies a module named Data.ByteString.

Cabal automatically figures out that how to instantiate the utilities package by matching together module names. Specifically, the two packages above are connected through the module name Data.ByteString. This makes for a very convenient (and as it turns out, expressive) mode of package instantiation. By the way, reexported-modules is a new (orthogonal) feature which lets us reexport a module from the current package or a dependency to the outside world under a different name. The modules that are exported by the package are the exposed-modules and the reexported-modules. The reason we distinguish them is to make clear which modules have source code in the package (exposed-modules).

Unusually, strict-utilities is a package that contains no code! Its sole purpose is to mix existing packages.

Now, you might be wondering: how do we instantiate utilities with the lazy ByteString implementation? That implementation was put in Data.ByteString.Lazy, so the names don't match up. In this case, we can use another new feature, module thinning and renaming:

name: lazy-utilities build-depends: utilities, bytestring (Data.ByteString.Lazy as Data.ByteString) reexported-modules: Utils as Utils.Lazy

The utilities dependency is business as usual, but bytestring has a little parenthesized expression next to it. This expression is the thinning and renaming applied to the package import: it controls what modules are brought into the scope of the current package from a dependency, possibly renaming them to different names. When I write build-depends: bytestring (Data.ByteString.Lazy as Data.ByteString), I am saying "I depend on the bytestring package, but please only make the Data.ByteString.Lazy module available under the name Data.ByteString when considering module imports, and ignore all the other exposed modules." In strict-utilities, you could have also written bytestring (Data.ByteString), because this is the only module that utilities uses from bytestring.

An interesting duality is that you can do the renaming the other way:

name: lazy-utilities build-depends: utilities (Utils, Data.ByteString as Data.ByteString.Lazy), bytestring

Instead of renaming the implementation, I renamed the hole! It's equivalent: the thing that matters it that the signature and implementation need to be mixed under the same name in order for linking (the instantiation of the signature with the implementation) to occur.

There are a few things to note about signature usage:

  1. If you are using a signature, there's not much point in also specifying an explicit import list when you import it: you are guaranteed to only see types and definitions that are in the signature (modulo type classes... a topic for another day). Signature files act like a type-safe import list which you can share across modules.

  2. A signature can, and indeed often must, import other modules. In the type signature for singleton in Data/ByteString.hsig, we needed to refer to a type Word8, so we must bring it into scope by importing Data.Word.

    Now, when we compile the signature in the utilities package, we need to know where Data.Word came from. It could have come from another signature, but in this case, it's provided by the definite package base: it's a proper concrete module with an implementation! Signatures can depend on implementations: since we can only refer to types from those modules, we are saying, in effect: any implementation of the singleton function and any representation of the ByteString type is acceptable, but regarding Word8 you must use the specific type from Data.Word in prelude.

  3. What happens if, independently of my packages strict-utilities, someone else also instantiatiates utilities with Data.ByteString? Backpack is clever enough to reuse the instantiation of utilities: this property is called applicativity of the module system. The specific rule that we use to decide if the instantiation is the same is to look at how all of the holes needed by a package are instantiated, and if they are instantiated with precisely the same modules, the instantiated packages are considered type equal. So there is no need to actually create strict-utilities or lazy-utilities: you can just instantiate utilities on the fly.

Mini-quiz: What does this package do?

name: quiz-utilities build-depends: utilities (Utils, Data.ByteString as B), bytestring (Data.ByteString.Lazy as B) Sharing signatures

It's all very nice to be able to explicitly write a signature for Data.ByteString in my package, but this could get old if I have to do this for every single package I depend on. It would be much nicer if I could just put all my signatures in a package and include that when I want to share it. I want all of the Hackage mechanisms to apply to my signatures as well as my normal packages (e.g. versioning). Well, you can!

The author of bytestring can write a bytestring-sig package which contains only signatures:

name: bytestring-sig version: 1.0 indefinite: True build-depends: base exposed-signatures: Data.ByteString

...and declare that the bytestring package satisfies this signature:

name: bytestring implements: bytestring-sig-1.0

The implements fields is purely advisory: it offers a proactive check to library authors to make sure they aren't breaking compatibility with signatures, and it also helps Cabal offer suggestions for how to provide implementations for signatures.

Now, utilities can include this package to indicate its dependence on the signature:

name: utilities indefinite: True build-depends: base, bytestring-sig-1.0 exposed-modules: Utils

Unlike normal dependencies, signature dependencies should be exact: after all, while you might want an upgraded implementation, you don't want the signature to change on you!

Another interesting difference is that we specified the signatures using exposed-signatures, as opposed to required-signatures. We can summarize all of the fields as follows:

  1. exposed-modules says that there is a public module defined in this package
  2. other-modules says that there is a private module defined in this package
  3. exposed-signatures says that there is a public signature defined in this package
  4. required-signatures says that there is a "private" signature defined in this package
  5. reexported-modules says that there is a public module or signature defined in a dependency.

In this list, public means that it is available to clients. Notice the first four fields list all of the source code in this package. Here is a simple example of a client:

name: utilities-extras indefinite: True build-depends: utilities exposed-modules: Utils.Extra

Utils/Extra.hs defined in this package can import Utils (because it's exposed by utilities) but can't import Data.ByteString (because it's not exposed). Had we said reexported-modules: Data.ByteString in utilities, then Data.ByteString would have been accessible.

Do note, however, that the package is still indefinite (since it depends on an indefinite package). Despite Data.ByteString being "private" to utilities (not importable), a client may still refer to it in a renaming clause in order to instantiate the module:

name: utilities-extras-lazy build-depends: utilities-extras (Data.ByteString as Data.ByteString.Lazy), bytestring

You can't "hide" holes altogether: that would be like saying, "I'm never going to say what the actual implementation is!" But you can choose not to directly rely on them.

By the way, if Utils/Extra.hs, in utilities-extras, wanted to import Data.ByteString (even though utilities did not expose it), utilities-extras simply needs directly depend on the signature package:

name: utilities-extras indefinite: True build-depends: utilities, bytestring-sig == 1.0 exposed-modules: Utils.Extra

The Data.ByteString hole from utilities and the new hole included here are automatically checked for compatibility and linked together: you only need to provide one implementation for both of them.

Mini-quiz: What does this package do? Specifically, if I include it in a package, what modules are available for import?

name: attoparsec-sig version: 1.0 indefinite: True build-depends: base, bytestring-sig exposed-signatures: Data.Attoparsec Summary

We've covered a lot of ground, but when it comes down to it, Backpack really comes together because of set of orthogonal features which interact in a good way:

  1. Module signatures (mostly implemented but needs lots of testing): the heart of a module system, giving us the ability to write indefinite packages and mix together implementations,
  2. Module reexports (fully implemented and in HEAD): the ability to take locally available modules and reexport them under a different name, and
  3. Module thinning and renaming (fully implemented and in code review): the ability to selectively make available modules from a dependency.

To compile a Backpack package, we first run the traditional version dependency solving, getting exact versions for all packages involved, and then we calculate how to link the packages together. That's it! In a future blog post, I plan to more comprehensively describe the semantics of these new features, especially module signatures, which can be subtle at times. Also, note that I've said nothing about how to type-check against just a signature, without having any implementation in mind. As of right now, this functionality is vaporware; in a future blog post, I also plan on saying why this is so challenging.

Categories: Offsite Blogs

Is Parallel Scientific still around?

Haskell on Reddit - Tue, 08/26/2014 - 2:57pm

The colorado-based company Parallel Scientific was doing some cool HPC stuff using Haskell. But I haven't been able to find a website. Are they still around?

submitted by bitmadness
[link] [10 comments]
Categories: Incoming News

Typechecker friend or foe, a question of timing.

Haskell on Reddit - Tue, 08/26/2014 - 1:42pm

Depending on which side your are, (dynamic or static) the compiler is either your friend or your foe. The benefit of type checking are obvious, it stops you to run a program which doesn't work. However, dynamic language are popular and lots of people don't buy this argument. They speak about freedom, ease and quick development. For them type checking is not a help, it's on the way.

So instead of trying to convince them that their are wrong and that the compiler is there to help them. Why not try to understand why they feel it's on the way, and how can we change this ?

I'd been working for years in C++ then I switched to Ruby. I missed static typing. Now, after a few month doing only Haskell I understand what makes dynamic typing easier. The main drawback of typechecking is, I think, a problem of timing. If you ask anybody "would like a program which can analyse your code and find some error ?", everybody would probably say yes. However, most of them would probably also say "but not now".

The problem of type checking is in the "now" : you can't do anything before all the errors have been fixed. That's good, but that's on the way. The main difference I've seen between working in a static or typed language is the refactoring workflow. I remember spending hours fixing all calls to a method that I've changed the signature, only to realize at the end that what I did didn't work anyway. In a dynamic language, you can try something. see if it's work, modify it and then fix everything so that all tests pass. The drawback of it is you are never really sure that you fixed everything, (you have to rely on tests being well written covering everything scenario) but at least you've been able to test your new feature, get things done and show your boss that what he asked you to do works. With a static language, you can't do that. Type checking is on the way. You modify something, but you can't test it now. You have to fix everything else, then come back to it, when you most likely have forgotten what you where trying to do. It's a bit like not being able to run a specific test, but to always have to launch the all test suites and have to fix all tests in order.

People can argue that, in a ideal world, code has been split correctly, and if you change the signature of a function, you shouldn't need to alter too many calls to make the code type check. However, we don't leave in ideal world, and anyway, every time you need to fix a call to function your are writting is an interuption and gets in the way of writting this function.

Another drawback of type checking is debugging. Different people have differnt way of debuggind. Some use debugger, some add printf everywhere, some just stare at the code until the error pops up. Which type checking you don't have the choice. Someone said, "type checking" tranforms a class of runtime error to syntax error. This is great, except you can't debug a syntax error. The code doesn't run, so you can't set breakpoints, add traces etc ... You have to stare at code and figure out what's wrong.

So could we have the best of both world ? It would be great if we could tell the typechecker "I know, I've broken everything, but I don't care. I'm focusing on one bit of code, I just want that bit to be run. Please let me do it". A solution would be to comment and set every functions which does'nt type check to "undefined" or "error "don't type check"". That probably could be done automatically by the compiler. It would be nice to have a --dirty flag (or equivalent) which allows to ignore all type checking error and compiles what can be done.

submitted by maxigit
[link] [13 comments]
Categories: Incoming News

On CodeWorld and Haskell

Haskell on Reddit - Tue, 08/26/2014 - 10:39am
Categories: Incoming News

Chris Smith: On CodeWorld and Haskell

Planet Haskell - Tue, 08/26/2014 - 10:38am

I’ve been pouring a lot of effort into CodeWorld lately… and I wanted to write a sort of apology to the Haskell community.  Well, perhaps not an apology, because I believe I did the right thing.  But at the same time, I realize that decisions I’ve made haven’t been entirely popular among Haskell programmers.  I’d like to explain what happened, and try to make it up to you!

What Happened

Originally, I started this project using Haskell and the excellent gloss package, by Ben Lippmeier.  CodeWorld has been moving slowly further and further away from the rest of the Haskell community.  This has happened in a sequence of steps:

  1. Way back in 2011, I started “CodeWorld”, but at the time, I called it Haskell for Kids.  At the time, I understood that the reasons I’d chosen Haskell as a language were not about cool stuff like type classes (which I love) and monads and categories and other commonplace uses of solid abstractions (which fascinate me).  Instead, I chose Haskell for the simple reason that it looked like math.  The rest of Haskell came with the territory.  I built the first CodeWorld web site in a weekend, and I had to settle on a language and accept all that came with it.
  2. From the beginning, I made some changes for pedagogical reasons.  For example, gloss defines rotation to be clockwise.  I insisted on rotation working in the counter-clockwise direction, because that’s the convention universally used in math.  Later, I resized the canvas to 20×20, so that typical programs would need to use fractions and decimals, which is a middle school math education goal.  I made thes changes, even though they broke compatibility with a widely used package.  Sorry for anyone that’s struggled with this.
  3. I rebranded “Haskell for Kids” as CodeWorld, and stopped explicitly depending on gloss in favor of just reproducing its general approach in a new Prelude.  This was a deliberate attempt to get away from focusing on the Haskell language and libraries, and also to the accompanying import statements and such.  This hid the ways that Haskell was a general purpose language with uses outside this toy environment.  That is unfortunate.
  4. I rewrote the Haskell Prelude, to remove type classes.  Along the way, I collapsed the whole numeric type class hierarchy into a single type, and even got Luite (the author of GHCJS) to help me with some deep black magic to implement equality on arbitrary Haskell types without type classes.  This threw away much of the beauty of Haskell… in favor of dramatically improved error messages, and fewer things you need to know to get started.  It was a real loss.
  5. Finally, I commited the unforgivable sin.  I dropped curried functions, in favor of defining functions of multiple parameters using tuples.  This finally makes CodeWorld feel like a completely different language from Haskell.  That really sucks, and I know some people are frustrated.
Why It Happened?

First, I want to point out some things that are not the reason for any of this:

  • I did not do this because I think there’s something wrong with Haskell.  I love type classes.  I love currying, and especially love how it’s not just a convenient trick, but sometimes introduces whole new perspectives by viewing tedious functions of multiple parameters as simple, clean, and elegant higher-order functions.
  • I also did not do this because I think anyone is incapable of learning full-fledged Haskell.  In fact, I taught full-fledged Haskell to middle schoolers for a year.  I know they can do it.

So why did I do it?  Two reasons:

  • Teaching mathematics has always been more important to me than teaching Haskell.  While Haskell is an awesome programming language, mathematics is just an awesome perspective on life.  For every student who benefits from learning an inspiring programming language, many students will benefit from learning that humanity has a method called mathematics for thinking about fundamental truths in a systematic, logical way that can capture things precisely.  So any time I have to choose between pushing students further toward their math education or away from it, I’ll choose toward it.
  • Details matter.  Even though I know kids are capable of a lot, they are capable of a lot more without artificial obstacles in their way.  I learned this the hard way teaching this class the first time.  The smallest little things, with absolutely no great significance as a language, matter a lot.  Having to put parentheses around negative numbers obscures students from reaching leaps of understanding.  Confusing error messages mean the difference between a student who spends a weekend learning, and one who gives up on Friday afternoon and doesn’t think about it until the next school day.  Different surface syntax means that a lot of kids never fully make the connection that functions here are the same thing as functions there.

In the end, I do think these were the right decisions… despite the frustration they can cause for Haskell programmers who know there’s a better way.

Making Up For It

A couple weekends ago, though, I worked on something to hopefully restore some of this loss for Haskellers.  You see, all the changes I’ve made, in the end, come from replacing the Prelude module with my own alternative.  Specifically:

  1. I deliberately replaced functions from the Prelude with my modified versions.
  2. Because I provided an alternative Prelude, I had to hide the base package, which made it impossible to import things like Control.Monad.  This was not a deliberate decision.  It just happened.

So I fixed this.  I added to the codeworld-base package re-exports of all of the modules from base.  I renamed Prelude to HaskellPrelude in the process, so that it doesn’t conflict with my own Prelude.  And finally, I added a new module, CodeWorld, that exports all the really new stuff from CodeWorld like pictures, colors, and the interpreters for pictures, animations, simulations, etc.  The result is that you can now start your programs with the following:

import Prelude()
import HaskellPrelude
import CodeWorld -- If you still want to do pictures, etc.

main = putStrLn "Hello, World"

At this point, you can write any Haskell you like!  You aren’t even constrained to pure code, or safe code.  (The exception: TemplateHaskell is still rejected, since the compiler runs on the server, so TH code would execute code on the server.)

In fact, it’s even better!  You’re free to use GHCJS JavaScript foreign imports, to interact with the browser environment!  See a brief example here.  Now you’re out of the sandbox, and are free to play around however you like.

Right now, the CodeWorld module still uses uncurried functions and other CodeWorld conventions like Number for numbers, etc.  There’s no reason for this, and it’s something that I should probably change.  Anyone want to send a pull request?


Categories: Offsite Blogs

Dominic Steinitz: Haskell Vectors and Sampling from a Categorical Distribution

Planet Haskell - Tue, 08/26/2014 - 9:05am
Introduction

Suppose we have a vector of weights which sum to 1.0 and we wish to sample n samples randomly according to these weights. There is a well known trick in Matlab / Octave using sampling from a uniform distribution.

num_particles = 2*10^7 likelihood = zeros(num_particles,1); likelihood(:,1) = 1/num_particles; [_,index] = histc(rand(num_particles,1),[0;cumsum(likelihood/sum(likelihood))]); s = sum(index);

Using tic and toc this produces an answer with

Elapsed time is 10.7763 seconds. Haskell

I could find no equivalent function in Haskell nor could I easily find a binary search function.

> {-# OPTIONS_GHC -Wall #-} > {-# OPTIONS_GHC -fno-warn-name-shadowing #-} > {-# OPTIONS_GHC -fno-warn-type-defaults #-} > {-# OPTIONS_GHC -fno-warn-unused-do-bind #-} > {-# OPTIONS_GHC -fno-warn-missing-methods #-} > {-# OPTIONS_GHC -fno-warn-orphans #-} > {-# LANGUAGE BangPatterns #-} > import System.Random.MWC > import qualified Data.Vector.Unboxed as V > import Control.Monad.ST > import qualified Data.Vector.Algorithms.Search as S > import Data.Bits > n :: Int > n = 2*10^7

Let’s create some random data. For a change let’s use mwc-random rather than random-fu.

> vs :: V.Vector Double > vs = runST (create >>= (asGenST $ \gen -> uniformVector gen n))

Again, I could find no equivalent of cumsum but we can write our own.

> weightsV, cumSumWeightsV :: V.Vector Double > weightsV = V.replicate n (recip $ fromIntegral n) > cumSumWeightsV = V.scanl (+) 0 weightsV

Binary search on a sorted vector is straightforward and a cumulative sum ensures that the vector is sorted.

> binarySearch :: (V.Unbox a, Ord a) => > V.Vector a -> a -> Int > binarySearch vec x = loop 0 (V.length vec - 1) > where > loop !l !u > | u <= l = l > | otherwise = let e = vec V.! k in if x <= e then loop l k else loop (k+1) u > where k = l + (u - l) `shiftR` 1 > indices :: V.Vector Double -> V.Vector Double -> V.Vector Int > indices bs xs = V.map (binarySearch bs) xs

To see how well this performs, let’s sum the indices (of course, we wouldn’t do this in practice) as we did for the Matlab implementation.

> js :: V.Vector Int > js = indices (V.tail cumSumWeightsV) vs > main :: IO () > main = do > print $ V.foldl' (+) 0 js

Using +RTS -s we get

Total time 10.80s ( 11.06s elapsed)

which is almost the same as the Matlab version.

I did eventually find a binary search function in vector-algorithms and since one should not re-invent the wheel, let us try using it.

> indices' :: (V.Unbox a, Ord a) => V.Vector a -> V.Vector a -> V.Vector Int > indices' sv x = runST $ do > st <- V.unsafeThaw (V.tail sv) > V.mapM (S.binarySearch st) x > main' :: IO () > main' = do > print $ V.foldl' (+) 0 $ indices' cumSumWeightsV vs

Again using +RTS -s we get

Total time 11.34s ( 11.73s elapsed)

So the library version seems very slightly slower.


Categories: Offsite Blogs

Types depending on tuples.

haskell-cafe - Tue, 08/26/2014 - 8:46am
I'm doing some geometry in 2 and 3 dimensions. To do this, I've got Vector2d and Vector3d classes, which are basically just wrappers around 2 and 3-tuples of doubles: type Vector2d = (Double, Double) type Vector3d = (Double, Double, Double) I've also got a typeclass Vector so I can do polymorphic things on them: type Vector3d = (Double, Double, Double) (Ok, polymorphic thing). I now have a need to convert lists of Vector's to a list of lists of Doubles. I'd like that to be part of the Vector typeclass, but - well, declaring it is interesting. I can't just add a method "toLists :: a -> [Double]", because (according to ghc), there's no match between the expected type 'Double' and the actual type '[t1]'. Can I declare this typeclass? _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Categories: Offsite Discussion

Undesired email

haskell-cafe - Mon, 08/25/2014 - 6:12pm
Dear List, I have a problem with me new smartphone. Sorry for the inconvenient. Best, FelipeZ _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Categories: Offsite Discussion

Proposal: Simplify/Generalize Data.Dynamic

libraries list - Mon, 08/25/2014 - 4:15pm
I'd like to propose a cleanup to the API of Data.Dynamic. By way of (partial) motivation, a couple of years back Lennart posed a question on stackoverflow <http://stackoverflow.com/questions/10889682/how-to-apply-a-polymorphic-function-to-a-dynamic-value/10890414#comment39759480_10890414> about how to use Dynamic to implement apD :: forall f. Typeable1 f => (forall a. a -> f a) -> Dynamic -> Dynamic At the time, I offered a solution that is no longer applicable in a world with polykinded Typeable. But, if we change the definition of Data.Dynamic to data Dynamic where Dynamic :: Typeable a => a -> Dynamic from its current magic implementation in terms of unsafeCoerce and a manually held typeRep, then fromDynamic becomes a trivial application of Data.Typeable.cast and dynApply becomes easier, This would enable us to implement Data.Dynamic with its full constructor exposed without losing safety. In lieu of supplying the constructor, we could offer a form of withDyn :: Dynamic -> (forall a. Typeab
Categories: Offsite Discussion

darcs 2.8.5 released

Haskell on Reddit - Mon, 08/25/2014 - 4:08pm
Categories: Incoming News

streaming tab-separated logfile analysis with pipes (instead of Python)

Haskell on Reddit - Mon, 08/25/2014 - 1:07pm

I often have to write custom log analysis code dealing with TSV data that might look something like this:

server user timestamp event foo 1 2014-10-10 {"tag": "Login", "contents": {"foo": "bar"}}

A common task might be "count the number of logins per server", in which case I would write a python program using iterators to group lines by server, then count lines with an event tag "login". Some (untested) code is below:

import sys import json from itertools import * from collections import namedtuple Row = namedtuple('Row', ['server', 'userid', 'timestamp', 'event']) def mkRow(line): row = line.strip().split('\t') row[3] = json.loads(row[3]) return Row(*row) def numLogins(group, rows): count = 0 for row in rows: if row.event['tag'] == "Login": count += 1 return group, count # note that sys.stdin is buffered input, so we can't do a # 'getLine' loop: performance will be worse sys.stdin.next() rows = map(mkRow, sys.stdin) groups = groupby(rows, lambda row: row.server) results = starmap(numLogins, groups) print "server"+"\t"+"numLogins" for server, n in results: print(server+"\t"+str(n))

I'm trying to write this in Haskell using Pipes (in place of Python's Iterators). How do I achieve this idiomatically (and with good performance)? I've tried using the 'stdinLn' function from the Pipes tutorial, but seems to be much slower than my Python- Python 2 uses buffered input, so a "getLine" loop producer is not equivalent. (Making python use a sys.stdin.readline() loop makes it significantly slower than Haskell).

Unfortunately the Pipes-Bytestring library has defeated me: I can't even figure out how to get to a producer of [ByteString] (representing a TSV row) using Pipes-Bytestring

Any advice very appreciated!

submitted by statusfailed
[link] [45 comments]
Categories: Incoming News

Video series for Haskell?

Haskell on Reddit - Mon, 08/25/2014 - 12:38pm

I want to learn Haskell to add to my book of languages (I also plan C and C++) I am currently happy with my Java skills and was motivated to move on to Haskell because of XMonad. I would like a video series because they are just easier than reading most of the time. That and I'm already reading learn you a Haskell. Also, is there a good IDE for Linux that lets me code and execute it within the IDE? If not GHCI and Vim are more than enough.

submitted by antflga
[link] [19 comments]
Categories: Incoming News

Document for review: evaluation order and state tokens

glasgow-user - Mon, 08/25/2014 - 11:25am
As part of trac ticket 9390[1], Simon PJ recommended that we try to get a document written that clarifies some of the issues regarding evaluation order, and get it included in the GHC wiki. After a few iterations with review from Simon, I've got a first "publicly consumable" version available at: https://www.fpcomplete.com/user/snoyberg/general-haskell/advanced/evaluation-order-and-state-tokens I'd appreciate any feedback on this document before I add it to the wiki. Michael [1] https://ghc.haskell.org/trac/ghc/ticket/9390 _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users< at >haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Categories: Offsite Discussion