News aggregator

Yesod Web Framework: The ST monad and conduit

Planet Haskell - Thu, 01/16/2014 - 6:40am

I've recently been working on some higher level analysis utilities based on top of conduit. A major component of this work is performing high-performance numerical calculations on large streams of data. So I was particularly intriguied when I saw a StackOverflow question that touched on the same points. I'd like to elaborate on my answer to that question, and demonstrate a possible addition to the conduit library.

The issue at play in that question is the desire to tally up the frequency of each octet in a stream of data. If you look through the answers, it quickly becomes apparent that using some kind of a mutable packed data structure (either Vector or Array) provides drastically better performance than immutable data structures. For our purposes, let's stick with the vector library, though the discussion here applies equally well to array.

The actions we need to perform are very straight-forward: read in the entirety of the data from some source (let's say standard input), and perform a mutating action for each and every octet that we receieve. We could read the entire stream of data using lazy I/O, but as readers of this blog are likely aware, I'd rather avoid lazy I/O when possible. So in my answer, I used conduit to read in the stream of data. The answer looks like this:

import Control.Monad.Trans.Class (lift) import Data.ByteString (ByteString) import Data.Conduit (Consumer, ($$)) import qualified Data.Conduit.Binary as CB import qualified Data.Vector.Unboxed as V import qualified Data.Vector.Unboxed.Mutable as VM import System.IO (stdin) freqSink :: Consumer ByteString IO (V.Vector Int) freqSink = do freq <- lift $ VM.replicate 256 0 CB.mapM_ $ \w -> do let index = fromIntegral w oldCount <- freq index VM.write freq index (oldCount + 1) lift $ V.freeze freq main :: IO () main = (CB.sourceHandle stdin $$ freqSink) >>= print

We now have a reusable freqSink component that will consume a stream of ByteStrings to produce a Vector Int of the frequencies. The Sink creates a new mutable vector to hold the frequency values, maps over all the input octets, and for each octet updates the mutable vector. Finally, it freezes the mutable vector into an immutable one and returns it.

I like almost everything about this, except for two characters: IO. Our freqSink function sets the base monad to be IO, implying that freqSink may perform actions that have an impact on the outside world. However, we know that this isn't the case: by analyzing the code, we see that all of the mutating changes are contained within the little world that freqSink creates for itself. In other words, this function is referentially transparent, but the type signature is saying otherwise.

Fortunately, Haskell already has the perfect solution for this kind of a problem: the ST monad. All we need to do is swap IO for ST s and freqSink will be properly annotated as being referentially transparent. But when we make this change, we get the following error message:

Couldn't match type `ST s0' with `IO'

The problem is that, while freqSink is refentially transparent, sourceHandle is not. Since the source is capable of performing arbitrary IO, it has to live in the IO base monad, and since the two components live in the same processing pipeline, freqSink must match that base monad as well. While this all works, it's still quite disappointing.

But perhaps we can have our cake and eat it too. We want freqSink's type signature to be refentially transparent, which means it needs to live in ST. What we need is some way to turn an ST-based Sink into an IO-based Sink. And there's a function that let's us do just that: unsafeSTToIO. This ends up looking like:

import Control.Monad.Morph (hoist) import Control.Monad.ST (ST) import Control.Monad.ST.Unsafe (unsafeSTToIO) import Control.Monad.Trans.Class (lift) import Data.ByteString (ByteString) import Data.Conduit (Consumer, ($$)) import qualified Data.Conduit.Binary as CB import qualified Data.Vector.Unboxed as V import qualified Data.Vector.Unboxed.Mutable as VM import System.IO (stdin) freqSink :: Consumer ByteString (ST s) (V.Vector Int) freqSink = do freq <- lift $ VM.replicate 256 0 CB.mapM_ $ \w -> do let index = fromIntegral w oldCount <- freq index VM.write freq index (oldCount + 1) lift $ V.freeze freq main :: IO () main = (CB.sourceHandle stdin $$ hoist unsafeSTToIO freqSink) >>= print

This once again works, and freqSink's type signature now indicates that it is referentially transparent. However, we've put two heavy burdens on users of our freqSink function:

  • They need to know about hoist and understand how to use it.
  • They need to pull in an unsafe function and know in which circumstances it's safe to use it.

What we really want is to provide a general purpose function which is completely safe. We want to contain the concept of "I can safely swap out the base monad of this Conduit with some other base monad." So I've just pushed a new commit to conduit adding the conduitSwapBase helper function (name up for debate). Let's start by seeing how it solves our present problem:

import Control.Monad.ST (ST) import Control.Monad.Trans.Class (lift) import Data.ByteString (ByteString) import Data.Conduit (Consumer, ($$)) import qualified Data.Conduit.Binary as CB import Data.Conduit.Util (conduitSwapBase) import qualified Data.Vector.Unboxed as V import qualified Data.Vector.Unboxed.Mutable as VM import System.IO (stdin) freqSinkST :: Consumer ByteString (ST s) (V.Vector Int) freqSinkST = do freq <- lift $ VM.replicate 256 0 CB.mapM_ $ \w -> do let index = fromIntegral w oldCount <- freq index VM.write freq index (oldCount + 1) lift $ V.freeze freq freqSink :: Monad m => Consumer ByteString m (V.Vector Int) freqSink = conduitSwapBase freqSinkST main :: IO () main = (CB.sourceHandle stdin $$ freqSink) >>= print

I renamed the previous function to freqSinkST, leaving its type exactly as it was. In addition, we now have a new freqSink, which can live in any base monad. The type signature makes it completely clear that this function is referentially transparent. And all we needed to do was use conduitSwapBase to perform this conversion.

Once that conversion is performed, we can easily combine freqSink with our IO-based sourceHandle. Or for that matter, it could be combined with a completely pure source, or a source living in the Maybe monad.

I believe this function could be used to clean up the compression/decompression functions in zlib-conduit and (at least some of) the functions in blaze-builder-conduit.

As it stands right now, conduitSwapBase will allow the following base transformations to be applied:

  • ST can be converted to ~~any other monad~~. EDIT See update below.
  • Identity can be converted to any other monad.
  • IO can be converted to any instance of MonadIO.
  • For many transformers (all instances of MonadTransControl actually), if the base monad m1 can be converted to m2, then the transformer t m1 can be converted to t m2.

This addition allows us to keep more type safety in our codebase, while still allowing safe interleaving of IO actions with pure code. I'm happy with the addition so far, I'm curious to hear further ideas from the community.

UPDATE: As pointed out on Reddit, a backtracking base monad can break refential transparency for ST. I've pushed a new commit that constrains the types of monads that can be converted to. In particular, it works for monads which are processed in a linear/non branching manner. This includes Identity, IO and Maybe, and transformers like ReaderT and ErrorT.

I'm currently calling this concept MonadLinear, but I have a strong feeling that there's a better abstraction already in existence.

Categories: Offsite Blogs

The ST monad and conduit

Haskell on Reddit - Thu, 01/16/2014 - 6:38am
Categories: Incoming News

Jan Stolarek: Verifying weight biased leftist heaps using dependent types (a draft)

Planet Haskell - Thu, 01/16/2014 - 5:03am

I recently wrote a paper for the 26th International Conference on Computer Aided Verification (CAV’14), Vienna, Austria. The paper is titled “Verifying weight biased leftist heaps using dependent types”. In this paper I have taken a weight biased leftist heap data structure presented by Okasaki in “Purely Functional Data Structures” and created its Agda implementation, which uses dependent types to statically guarantee that rank and priority invariants are maintained. My verification is based on techniques demonstrated by Altenkirch, McKinna and McBride in “Why Dependent Types Matter” but my focus is on things that were left out in that paper. In the end my paper is an intermediate level tutorial on verification in Agda but also a case study of a purely functional data structure implemented in a dependently typed setting. Worth noting is the fact that despite title of Okasaki’s book his implementation is not purely functional as it uses exceptions to deal with edge cases like taking the smallest element from an empty heap.

Draft version of the paper can be downloaded here. It comes with companion source code that contains a thorough discussion of concepts presented in the paper as well as others that didn’t make it into publication due to space limitations. Companion code is available at GitHub (tag “blog-post-draft-release” points to today’s version). The paper is mostly finished. It should only receive small corrections and spelling fixes. However, if you have any suggestions or comments please share them with me – submission deadline is in three weeks so there is still time to include them.

Categories: Offsite Blogs

Call for Papers: TASE 2014

General haskell list - Thu, 01/16/2014 - 2:24am
TASE 2014 - CALL FOR PAPERS ****************************************************************** The 8th International Symposium on Theoretical Aspects of Software Engineering (TASE 2014) 1-3 September 2014, Changsha, China For more information email: TASE2014< at > ****************************************************************** OVERVIEW The 8th Theoretical Aspects of Software Engineering Symposium (TASE 2014), will be held in Changsha, China in September, 2014. Modern society is increasingly dependent on software systems that are becoming larger and more complex. This poses new challenges to the various aspects of software engineering, for instance, software dependability in trusted computing, interaction with physical components in cyber physical systems, distribution in cloud computing applications, etc. Hence, new concepts and
Categories: Incoming News

ANN: castle

haskell-cafe - Wed, 01/15/2014 - 10:24pm
I'd like to announce the first release of castle ( and From the README: This tool is still pretty rough around the edges, but I've been using it some, and it's to the point that more feedback would be helpful. Let me know what bugs and rough patches you find. Thanks, Eric _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >
Categories: Offsite Discussion

What's up with Contravariant?

Haskell on Reddit - Wed, 01/15/2014 - 9:27pm

What's it for? I know the Functor class is for computations with a context, and Contravariant is similar in the type signature. Is it possible to use Contravariant with Applicative and Monad?

submitted by danmwilson
[link] [14 comments]
Categories: Incoming News


libraries list - Wed, 01/15/2014 - 4:42pm
Hi, I suggest to improve the documentation for System.FilePath.splitFileName. The sentence "combine is the inverse" (of splitFileName) is misleading, as the following example shows: Prelude System.FilePath> uncurry combine $ splitFileName "bob" "./bob" adding normalize to the result only also gives not (String) identity: Prelude System.FilePath> normalise $ uncurry combine $ splitFileName "./bob" "bob" I suggest to change it to: "combine is the inverse modulo normalise or equalFilePath" Maybe also the following combine examples should be added: combine "./" "bob" == "./bob" combine "." "bob" == "./bob" Cheers Christian
Categories: Offsite Discussion

Yesod Web Framework: PSA: Please use yesod-platform

Planet Haskell - Wed, 01/15/2014 - 3:57pm

Recently, a large number of the dependencies for Yesod (aeson, fast-logger, warp, attoparsec, etc) have gone through major version bumps. At the same time, a number of other dependencies of Yesod have not relaxed their version bounds for the most recent versions of their dependencies. This situation leads to a lot of confusion for cabal trying to figure out a coherent installation path.

This is precisely the situation for which the yesod-platform situation was designed. If you are having trouble getting Yesod installed, please try running cabal install yesod-platform instead of yesod. (In fact, unless you have reason to do otherwise, I highly recommend always doing this.) If you have an existing project with its own set of dependencies, go into that directory and run cabal install . yesod-platform, which will attempt to choose a version of yesod-platform with dependencies that fit your constraints. If you use some kind of sandboxing, modify the above commands as necessary.

And let me take this opportunity to mention: if you're trying to build your code on a stable platform, you should make sure that you pin down the versions of all of your dependencies. Greg Weber wrote a great blog post explaining a technique that should fit into most people's standard build processes. Using the Stackage pinned version set will also guarantee builds that are coherent. And of course, building on FP Haskell Center guarantees stable and tested versions of packages which will be maintained.

UPDATE There are two important command line options I should have mentioned. If you're running into dependency hell, try appending the following: --max-backjumps=-1 --reorder-goals.

Categories: Offsite Blogs

Postdocs / Research Programmer for Efficient First-Class Automatic Differentiation

haskell-cafe - Wed, 01/15/2014 - 2:01pm
Postdocs / Research Programmer for Compositional Learning via Generalized Automatic Differentiation The goal of this project is to make a qualitative improvement in our ability to write sophisticated numeric code, by giving numeric programmers access to _fast_, _robust_, _general_, _accurate_ differentiation operators. To be technical: we are adding exact first-class derivative calculation operators (Automatic Differentiation or AD) to the lambda calculus, and embodying the combination in a production-quality fast system suitable for numeric computing in general, and compositional machine learning methods in particular. Our research prototype compilers generate object code competitive with the fastest current systems, which are based on FORTRAN. And the combined expressive power of first-class AD operators and function programming allows very succinct code for many machine learning algorithms, as well as for some algorithms in computer vision and
Categories: Offsite Discussion

Type-Level Instant Insanity

Haskell on Reddit - Wed, 01/15/2014 - 12:06pm
Categories: Incoming News

mdo with multiple values

haskell-cafe - Wed, 01/15/2014 - 11:57am
Dear List, a little puzzle. Given a monad M with a MonadFix instance, and these two functions: act1 :: T -> M () act2 :: a -> M T I morally want to write this function: foo :: [a] -> M () foo = mdo mapM_ act1 xs xs <- mapM act2 return () Unfortunately, that will not work: mapM_ will force xs before any of it can be generated. But morally it should be possible, as the lists passed to mapM_ and mapM have the same, already known list. So here is my solution (which is a bit more general, because I happen to need some that in one place): mapFstMapSnd :: MonadFix m => [(a -> m (), m a)] -> m () mapFstMapSnd xs = const () `liftM` go xs (return []) where go [] cont = cont go ((f,s):xs) cont = mdo f v (v:vs) <- go xs $ do vs <- cont v <- s return (v:vs) return vs Using that, I can write
Categories: Offsite Discussion

Yesod :: Homepage - Wed, 01/15/2014 - 11:28am
Categories: Offsite Blogs

Philip Wadler: ADT and GADT implementations of simply-typed lambda calculus

Planet Haskell - Wed, 01/15/2014 - 10:15am
Lennart Augustsson posted a nifty description of a compiler from a simple expression language to LLVM that included a conversion from expressions represented as an ADT to expressions represented as a GADT. The ADT requires a separately implemented type checker, while the GADT piggybacks on Haskell's type system to ensure expressions are well typed. However, Lennart's expression language does not include lambda abstraction.

Based on Lennart's code, I implemented ADT and GADT versions of simply-typed lambda calculus with de Bruijn indices, integer constants, and addition, plus the conversion between them, without the distraction of compiling to LLVM. The code was cleaned and improved by Shayan Najd, and made publicly available via github. Thanks to Josef Svenningson for the pointer to Lennart's post.

Categories: Offsite Blogs