News aggregator

www.dohaskell.com

Haskell on Reddit - Wed, 06/11/2014 - 1:59pm
Categories: Incoming News

Deploy Haskell application

haskell-cafe - Wed, 06/11/2014 - 11:24am
Hi guys! Is there a procedure to deploy a Haskell application? I have an Amazon EC2 micro instance to run my application, but it's way too small to compile it using cabal (compilation takes half a day rouhgly), so I compile it on my computer. Is there a convenient way to bundle the executable with the resources and ship it to the server? Thanks! Corentin _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Categories: Offsite Discussion

Mike Izbicki: do your lenses even do notation?

Planet Haskell - Wed, 06/11/2014 - 10:18am

It’s round 5 of typeparams versus GHC. We’ll be extending our Functor and Applicative classes to define a new Monad class. It’s all pretty simple if you just remember: lensified monads are like burritos with fiber optic cables telling you where to bite next. They’re also just monoids in the category of lens-enhanced endofunctors. Piece of cake.

some naughty extensions

We’ll be using all the same extensions as before:

> {-# LANGUAGE TemplateHaskell #-} > {-# LANGUAGE ScopedTypeVariables #-} > {-# LANGUAGE KindSignatures #-} > {-# LANGUAGE TypeFamilies #-} > {-# LANGUAGE MultiParamTypeClasses #-} > {-# LANGUAGE UndecidableInstances #-} > {-# LANGUAGE FlexibleInstances #-} > {-# LANGUAGE RankNTypes #-}

But we’ll be adding some pretty nasty ones today:

> {-# LANGUAGE OverlappingInstances #-} > {-# LANGUAGE RebindableSyntax #-}

We need RebindableSyntax to get do notation, but OverlappingInstances is just a product of the Monad class’s definition. I’ll give infinite haskell points to anyone who can refactor this code so we don’t need the extension!

We’ll also be needing all of our previous work on Functors and Applicatives. It has been uploaded to hackage and is sitting in the appropriate modules:

> import Control.Category > import Prelude hiding ( (.), id, Functor(..), Applicative(..), Monad(..) ) > import qualified Prelude as P > import GHC.Exts > import Data.Params > import Data.Params.Applicative > import Data.Params.Functor

And we’re off!

joins and cojoins

We will define our monads in terms of their join function. In the standard libraries, join has the type:

join :: m (m a) -> m a

The input has the same type as the output, except that the Monad m is repeated twice. There are two differences in the lensified join function: First, the monad we’re working with might be nested arbitrarily deeply in other data types. Second, the argument it is monadic in might not be the last one. Here is an example of what the join type signature would look like for the Left Either monad sitting within a Maybe Monad:

join :: TypeLens Base (Param_a (Param_a Base)) -> Maybe (Either (Either String Int) Int) -> Maybe (Either String Int)

Since we’re all wannabe category theorists here, we’ll create a CoJoin type family that transforms the output of the join function by duplicating the type at location specified by the lens:

> type family CoJoin (lens :: * -> Constraint) t > type instance CoJoin lens t > = SetParam' > lens > ( SetParam' > ( Objective lens ) > ( GetParam lens t ) > ( GetParam (RemoveObjective lens) t ) > ) > t

(We covered the Objective and RemoveObjective families in a previous post. As a reminder, the Objective family returns the innermost type lens from our input, and the RemoveObjective family returns the lens that results when the innermost lens is taken away.)

CoJoin only has one instance, so we could have just used a type synonym. That would make debugging harder, however. The advantage of a type family is that when we ask GHCi what the type is, it will perform the substitutions for us. For example:

ghci> :t undefined :: CoJoin (Param_a Base) (Maybe (Either String Int)) :: Maybe (Maybe (Either String Int)) ghci> :t undefined :: CoJoin (Param_a (Param_a Base)) (Maybe (Either String Int)) :: Maybe (Either (Either String Int) Int) making the monad

Now we’re ready to see our new Monad class:

> class Applicative lens tfb => Monad lens tfb where > join :: > ( tffb ~ CoJoin lens tfb > ) => TypeLens Base lens > -> tffb -> tfb

The Left and Right Either instances are:

> instance Monad (Param_a Base) (Either a b) where > join lens (Left (Left a)) = Left a > join lens (Left (Right b)) = Right b > join lens (Right b) = Right b > instance Monad (Param_b Base) (Either a b) where > join lens (Right (Right b)) = Right b > join lens (Right (Left a)) = Left a > join lens (Left a) = Left a

And here are some examples of join in action:

ghci> join _b (Right $ Right "monads") :: Either String String Right "monads" ghci> join _b (Right $ Left "are") :: Either String String Left "are" ghci> join _a (Left $ Left "so") :: Either String String Left "so" ghci> join _a (Right "awesome") :: Either String String Right "awesome"

The instances above don’t consider the case when our lenses point inside of the Either type. We’ll need to define two new recursive instances to handle this case. These instances are the reason we needed the OverlappingInstances language extension:

> instance > ( Monad p a > , Either (CoJoin p a) b ~ CoJoin (Param_a p) (Either a b) -- follows from the lens laws > ) => Monad (Param_a p) (Either a b) > where > > join lens (Left a) = Left $ join (zoom lens) a > join lens (Right b) = Right b > instance > ( Monad p b > , Either a (CoJoin p b) ~ CoJoin (Param_b p) (Either a b) -- follows from the lens laws > ) => Monad (Param_b p) (Either a b) > where > > join lens (Left a) = Left a > join lens (Right b) = Right $ join (zoom lens) b

The equality constraints in the instances above are implied by the lens laws. As we discussed yesterday, with the type rules language extension, those constraints could be removed completely, making the code a bit nicer.

Here are some examples of using join in the nested case:

ghci> join (_a._b) (Left $ Right $ Right "lenses") :: Either (Either a String) b Left (Right "lenses") ghci> join (_a._b) (Left $ Right $ Left "are") :: Either (Either String b) b Left (Left "are") ghci> join (_b._b) (Left "neat") :: Either String (Either a String) Left "neat"

Sometimes we will get the same answer if we join in two separate locations. In the first example below, we join the second two Right constructors, whereas in the second example, we join the first two Right constructors. The results are the same:

ghci> join (_b._b) (Right $ Right $ Right "easy") :: Either a (Either a String) Right (Right "easy") ghci> join _b (Right $ Right $ Right "peasy") :: Either a (Either a String) Right (Right "peasy")

We’ll also be needing a Monad instance for Maybe, so here it is:

> instance Monad (Param_a Base) (Maybe a) where > join lens Nothing = Nothing > join lens (Just Nothing) = Nothing > join lens (Just (Just a)) = Just a > instance > ( Monad p a > , Maybe (CoJoin p a) ~ CoJoin (Param_a p) (Maybe a) -- follows from the lens laws > ) => Monad (Param_a p) (Maybe a) > where > join lens Nothing = Nothing > join lens (Just a) = Just $ join (zoom lens) a bind

From join and our Applicative instance, we can derive our monadic bind function. We don’t want to use the traditional (>>=) operator for bind just yet. We will need to do something fancy with it to make do notation work out. So instead, we will use the (\\=) operator for bind. Its definition is:

(\\=) :: ( Monad lens tb , a ~ GetParam lens tfa , {- ... lens laws go here ... -} ) => ta -> (a -> tb) -> TypeLens Base lens -> tb > infixl 1 \\= > (m \\= f) lens = join lens $ fmap lens f m

We will create the “minus bind operators” in the same way we created minus operators for the Applicative class. Remember, the minus sign points to the parameters that will get a lens applied to them because they are “minus a lens”. These minus operators are defined as:

> infixl 1 \\=- > infixl 1 -\\=- > infixl 1 -\\= > (m \\=- f) lens = ( m \\= \a -> f a $ objective lens ) lens > (m -\\=- f) lens = ( m lens \\= \a -> f a $ objective lens ) lens > (m -\\= f) lens = ( m lens \\= \a -> f a ) lens example time

For our example, we’ll build a simple monadic filter. The filterSmall function below sits in the Either Monad, but we’ll be using Left to represent successes (the input passes through the filter), and Right to represent failure (the input doesn’t pass through).

> filterSmall :: (Show a, Ord a) => a -> a -> Either a String > filterSmall k x = if x > k > then Left x > else Right $ show x ++ " is too small"

We can call our function using the monadic bind by:

> chain1 :: Either Int String > chain1 = at _a $ Left 20 \\= filterSmall 10 ghci> chain1 Left 20

Instead of using the Left constructor, we can make things a little more generic by using the return function. As usual, it is equivalent to pure:

> return :: Monad lens t => GetParam lens t -> TypeLens Base lens -> t > return = pure

Sine pure’s last parameter is a type lens, we must use the left-minus (-\\=) variant of bind to sequence the computation:

> chain2 :: Either Int String > chain2 = at _a $ return 20 -\\= filterSmall 10 ghci> chain2 Left 20

Similarly, all the bind operators take a type lens as their last parameter. So any future binds must also use left-minus bind:

> chain3 :: Either Int String > chain3 = at _a $ return 20 -\\= filterSmall 10 -\\= filterSmall 15 ghci> chain3 Left 20

And so on:

> chain4 :: Either Int String > chain4 = at _a $ return 20 -\\= filterSmall 10 -\\= filterSmall 15 -\\= filterSmall 25 ghci> chain4 Right "20 is too small"

We can easily nest our monads. Let’s put all of the computations above inside a Maybe wrapper. All we have to do is change the type signature and the lens:

> chain2' :: Maybe (Either Int String) > chain2' = at (_a._a) $ return 20 -\\= filterSmall 10 > chain3' :: Maybe (Either Int String) > chain3' = at (_a._a) $ return 20 -\\= filterSmall 10 -\\= filterSmall 15 > chain4' :: Maybe (Either Int String) > chain4' = at (_a._a) $ return 20 -\\= filterSmall 10 -\\= filterSmall 15 -\\= filterSmall 25 do the notation

We’re using the RebindableSyntax language extension to construct a custom do notation. We do this by defining our own (>>=) operator. The most generic bind operator we have is the double minus bind (-\\=-). Sometimes we will want to feed a lens to both sides of the bind, so that’s what we’ll use:

> infixl 1 >>= > (m >>= f) lens = (m -\\=- f) lens

Notice that our (>>=) operator and the one from Prelude take different numbers of arguments! GHC is awesome enough that this is not a problem.

RebindableSyntax also requires us to define functions for failed pattern matching and if statements. Our definitions will be pretty simple:

> fail = error > ifThenElse False _ f = f > ifThenElse True t _ = t

Now, we can take our chain2′ function above and rewrite it in do notation. Here it is again for easy reference:

chain2' :: Maybe (Either Int String) chain2' = at (_a._a) $ return 20 -\\= filterSmall 10

First, rewrite it to use (-\\=-) instead of (-\\=) by causing the right hand side to take a lens parameter even though it won’t use it:

> chain2'' :: Maybe (Either Int String) > chain2'' = at (_a._a) $ return 20 -\\=- (\x lens -> filterSmall 10 x)

Then, rewrite it using do notation:

> chain2''' :: Maybe (Either Int String) > chain2''' = at (_a._a) $ do > x <- return 20 > \lens -> filterSmall 10 x

It looks a little bit nicer if we use const to absorb the lens parameter:

> chain2'''' :: Maybe (Either Int String) > chain2'''' = at (_a._a) $ do > x <- return 20 > const $ filterSmall 10 x

Here is our other examples converted into do notation using the same technique:

> chain3''' :: Maybe (Either Int String) > chain3''' = at (_a._a) $ do > x <- return 20 > y <- const $ filterSmall 10 x > const $ filterSmall 15 y > chain4'' :: Maybe (Either Int String) > chain4'' = at (_a._a) $ do > x <- return 20 > y <- const $ filterSmall 10 x > z <- const $ filterSmall 15 y > const $ filterSmall 25 z

And here is a more complicated expression with a nested do:

> chain5 :: Either a (Either a (Maybe (Either Int String))) > chain5 = at (_b._b._a._a) $ do > x <- return 20 > y <- do > a <- const $ filterSmall x 10 > b <- const $ filterSmall 1 3 > return $ a+b > z <- const $ filterSmall y x > return $ z-x

But there is still a limitation. Due to the way the types work out, the first line of a do block must always be a return statement when using the at function to specify our lens. This is a by product of the extra lens parameter our (>>=) operator is passing around. Fortunately, we can automate this construction with the following function:

> atM lens m = at (removeObjective lens) $ do > return $ at (objective lens) $ m

This lets us rewrite chain5 as:

> chain5' :: Either a (Either a (Maybe (Either Int String))) > chain5' = atM (_b._b._a._a) $ do > let x = 20 > y <- do > a <- const $ filterSmall x 10 > b <- const $ filterSmall 1 3 > return $ a+b > z <- const $ filterSmall y x > return $ z-x

Now we fully support do notation!

Hooray!!

Tune in next time…

How do we get rid of those ugly const functions?

Can optimus prime use type lenses to save our purity from the effects of the evil decepticons?

Does any one actually care about lensified arrow-do?

Stay tuned to find out.

Categories: Offsite Blogs

Functional Jobs: Developer / DevOps Engineer at Klarna (Full-time)

Planet Haskell - Wed, 06/11/2014 - 5:22am

Our FRED teams are responsible for building and operating Klarna’s highly scalable and highly available purchase-accepting system. By combining Erlang, RIAK and RabbitMQ this business critical, multi-node, no-master system handles our exponential growth of real time transactions. The teams have end-to-end responsibility and believe in relentless automation, shipped code and reusability.

We are looking for a functional programmer that wants to take on some operations responsibilities and who believes that you should be responsible for what you create. The role will primarily involve developing infrastructure for logging, monitoring, testing, deployment and operations in order to support the system and contributing feature teams, but a willingness to understand the underlying business-logic and when necessary contribute some feature development is also important. We are looking for someone who can grow into being a Hamiltonian salty mechanic in a mission critical helicopter, so to speak.

Required

  • B.Sc/M.Sc in Computer Science
  • experience with a functional programming language (Erlang, a LISP, an ML, Haskell, etc)
  • comfortability with shell scripting
  • extremely high attention to detail
  • intellectual curiosity
  • a preference for simple, elegant, and reusable solutions
  • a get-things-done attitude

Nice to have

  • some experience with configuration management (Chef, Puppet, ansible, cfengine, etc)
  • some exposure to Java, Erlang or Ruby
  • experience with graphite, collectd
  • experience with virtualization (CloudStack, AWS, etc)
  • nagios/op5 experience
  • experience with Splunk API
  • RabbitMQ experience
  • Riak experience (or experience with distributed databases NoSQL/SQL)
  • proven technical writing ability

Would be cool

  • history of giving entertaining technical talks at conferences

If you have any question please email Philip Alsén at: philip [dot] alsen [at] klarna [dot] com. Apply as soon as possible since we are working continuously with the recruitment process.

Location Stockholm, Sweden

Get information on how to apply for this position.

Categories: Offsite Blogs

Erik de Castro Lopo: Moving from Wai 2.X to 3.0.

Planet Haskell - Wed, 06/11/2014 - 4:16am

Michael Snoyman has just released version 3.0 of Wai, the Haskell Web Application Interface library which is used with the Yesod Web Framework and anything that uses the Warp web server. The important changes for Wai are listed this blog post. The tl;dr is that removing the Conduit library dependency makes the Wai interface more easily usable with one of the alternative Haskell streaming libraries, like Pipes, Stream-IO, Iterator etc.

As a result of the above changes, the type of a web application changes as follows:

-- Wai > 2.0 && Wai < 3.0 type Application = Request -> IO Response -- Wai == 3.0 type Application = Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived

Typically a function of type Application will be run by the Warp web server using one of Warp.run or associated functions which have type signatures of:

run :: Port -> Application -> IO () runSettings :: Settings -> Application -> IO () runSettingsSocket :: Settings -> Socket -> Application -> IO ()Source runSettingsConnection :: Settings -> IO (Connection, SockAddr) -> Application -> IO ()

Its important to note that the only thing that has changed about these Warp functions is the Application type. That means that if we have a function oldWaiApplication that we want to interface to the new version of Wai, we can just wrap it with the following function:

newWaiApplication :: Manager -> Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived newWaiApplication mgr wreq receiver = oldWaiApplication mgr wreq >>= receiver

and use newWaiApplication in place of oldWaiApplication in the call to whichever of the Warp run functions you are using.

Categories: Offsite Blogs

Actually, installing and updating GHC on Mac OS X is really easy

Haskell on Reddit - Wed, 06/11/2014 - 12:48am

For a short while today, I was trying to figure out how to install GHC on a new OS X installation. There are tons of ways to do it. Too many, and most don't feel easily maintainable.

Then, I came across GHC for MAC OS X: http://ghcformacosx.github.io

Just what I'd been looking for! But wouldn't it be nice if you could automatically update it from the command line each time someone packages a new one (like brew)?

Well, turns out you can.

A pull request is in the process of upgrading their cask to r6, and with some community backing (or even some help from the author of ghcformacosx) we could easily keep it current and shiny.

submitted by kvanberendonck
[link] [6 comments]
Categories: Incoming News

non-exhaustive pattern match(es)?

haskell-cafe - Wed, 06/11/2014 - 12:17am
Dear Haskell Cafe, In the following program f :: Int -> [(Int,Int)] -> [(Int,Int)] -> [(Int,Int)] f _ cs [] = cs f _ [] _ = [] f i (c:cs) ((i',t):os) | i < i' = [] | i == i' = [] | i > i' = []
Categories: Offsite Discussion

EclipseFP 2.6.0 released

Haskell on Reddit - Tue, 06/10/2014 - 11:59pm
Categories: Incoming News

Keegan McAllister: On depression, privilege, and online activism

Planet Haskell - Tue, 06/10/2014 - 11:06pm

[Content warning: depression, privilege, online activism]

This isn't a general account of my experiences with depression. Many people have written about that, and I don't have much to add. But there's one aspect that I don't hear about very often. It's something that bothers me a lot, and others have told me that it bothers them too.

The thing is, I'm not just a person with a mental illness. I'm also a well-off white guy, and I enjoy a whole set of unearned privileges from that. Every day people around the world are harassed, abused, and killed over things I never have to worry about. Even in mundane daily life, most everyone is playing on a higher difficulty setting than I ever will.

I've thought about this a lot over the past few years, and I'm trying to understand how I can help make the world more fair and less oppressive. So I give money and I volunteer a little and I speak up when it seems useful, but mostly I listen. I listen to the experiences of people who are different from me. I try to get some understanding of how they feel and why.

How is this related to depression? Because the reality of privilege and oppression is fucking depressing. Of course it's depressing to those who are directly harmed. That's a lot of what I read about, and some of the despair transfers to me. But my profiting from the suffering of others in a way that I mostly can't change is also depressing, at least if I make an attempt not to ignore it.

And my distress over my role in systems of oppression brings its own layer of guilt. People are actually suffering and I feel sorry for myself because I'm dimly aware of it? But this comes from the voice that has always taunted me about depression. “How can you be sad? Your life is great. If you had real problems you wouldn't be so pathetic. You're not really sick. You're just a whiner.”

All of which is part of the disease. I need to own it and work on it every day. But it seems like every time I read an online discussion about social justice, I take a huge step backwards.

It's hard to shrug off the “men are horrible” comments when I spend so much effort trying to convince myself that I'm not horrible. When I hear people gloating about delicious white male tears, I think about all the times when I would come home from work and collapse in bed crying. Is this what they want my life to be?

I can't give myself permission to tune out, because the same people lecture constantly about my obligation to be a good ally, which mostly takes the form of “shut up and listen.” And then when I'm upset by the things they say, the response is “This isn't for you! Why are you listening?”

A local group, one that had recently invited me to hang out as a guest, retweeted a member's declaration to would-be allies: “We're not friends. Fuck you.” Can you see why it feels like they're trying to hurt me?

Let me be clear: I truly don't care if people in a room somewhere are talking about how men are the worst. I don't feel oppressed by it, and I have no desire to argue with it. But I can't handle direct exposure.

And don't tell me that I'm too stupid to understand why they say these things. I know intellectually that it's not about me. I understand the need to vent and the importance of building solidarity. None of that matters on the emotional level where these comments register like a punch to the gut. I do feel this way, even if I shouldn't and I wish I didn't.

I'm talking about mental health, triggers, and unintentionally hurtful speech. Does that sound familiar? One reason I was drawn to intersectional feminism is that it seemed to have a good set of ground rules for how to treat everyone decently. But now I feel like I'm excluded from protection. “Men are horrible” is apparently the one form of speech where intent is all that matters, and I'm a bad person if it triggers something. I've been told it's offensive that I would even try to describe my experience in those terms.

It hurts a whole lot to try and really feel someone's pain, and then realize they don't even slightly give a shit about me. It hurts even more when they'll bend over backwards for anyone except me.

Look, I get it. You argue all the time with trolls who claim that men have it just as bad as women and will shout “what about the men” as a way to disrupt any discussion. When you're engaged in meme warfare, you can't show them any human empathy. They certainly wouldn't return the favor. And if my voice sounds a little like theirs, that's just too bad for me.

I know that this article will serve as ammunition for some people with views I find disgusting. That sucks, but I'm done using political strategy as a reason to stay silent. I understand tone policing as a derailing tactic, and I understand the need to call it out. But at this point it seems there's no room for a sincere request for kindness, especially coming from someone who doesn't get much benefit of the doubt. (The Geek Feminism Wiki basically says that asking for kindness is tone policing if and only if you're a man.)

I'm not trying to silence anyone here. I'm not jumping in and derailing an existing conversation. I'm writing on my own blog, on my own schedule, about my own feelings. But I'm told that even this is crossing a line.

I know that I can't dictate how others feel about our fucked-up world. Does that mean I must absolutely suppress the way I feel? Even when we agree about the substance of what's wrong? I know that if I ask someone to share their life experiences, they have a right to express anger. When does expressing anger become sustained, deliberate cruelty?

“People are being oppressed and you're asking us to care about your feelings?” Yes, I am asking you to care. Just a little bit. I don't claim that my feelings should be a top priority. I hope it wouldn't come up very often. But according to the outspoken few who set the tone, I'm never allowed to bring it up. I don't deserve to ask them to be nice.

And that's why I can no longer have anything to do with this movement. It's really that simple. I guess it says something about my state of mind that I felt the need to attach 1,700 words of preemptive defenses.

The truth is, when I'm not allowed to say or even think “not all men,” part of me hears “Yes, all men, especially you.” And if I'm ever confused about whether I'm allowed to say “not all men,” there are a dozen unprompted reminders every day. Little jokes, repeated constantly to set the climate about what will and won't be tolerated.

When you treat me like one of the trolls, I start to believe that I am one. Guys who say “I support feminism but sometimes they go too far” are usually trying to excuse sexist behavior. So what do I conclude about myself when I have the same thought?

I get that “ally” is not a label you self-apply, it's a thing you do, and the label comes from others. The problem is, if a hundred people say I'm a good ally, and one person says I'm a sexist asshole, who do you think I'm going to believe?

I'm not allowed to stand up for myself, because doing so is automatically an act of oppression. If a woman treats me like shit, and she's being “more feminist” than me, I conclude that I deserve to be treated like shit. That is the model I've learned of a good ally.

I'm not a good ally, or even a bad one. I'm collateral damage.

If the point of all this is to give me a tiny little taste of the invalidation that others experience on a regular basis, then congratulations, it worked. You've made your point. Now that you've broken me, how can I possibly help you, when it seems like I'm part of the problem just by existing? It feels like all I can do is engage in emotional self-harm to repay the debt of how I was born.

I can't just take a break “until I feel better.” My depressive symptoms will always come and go, and some thoughts will reliably bring them back. I spent years reading about how the most important thing I can do, as a winner of the birth lottery, is to be an ally to marginalized people. And now I've realized that I'm too sick and weak to do it.

Even if I give up on being an ally, I can't avoid this subject. It affects a lot of my friends, and I feel even worse when I ask them not to talk about it around me. I don't want to silence anyone. At least I've mostly stopped using Twitter.

So this is how I feel, but I'm not sure anyone else can do anything about it. Really, most of the people I've talked to have been sympathetic. Maybe I need to learn not to let bullies get to me, even when they're bullying in service of a cause I support. They don't seem to get much pushback from the wider community, at any rate.

What gives me hope is, I recognize that my participation in the endless shouting online wasn't really useful to anyone. If I can let myself ignore all that, maybe I can recover some of my energy for other activities that actually help people.

That's all I have to say right now. Thank you for listening to me.

Categories: Offsite Blogs

Bryan O'Sullivan: Win bigger statistical fights with a better jackknife

Planet Haskell - Tue, 06/10/2014 - 10:26pm

(Summary: I’ve developed some algorithms for a statistical technique called the jackknife that run in O(n) time instead of O(n2).)

In statistics, an estimation technique called “the jackknife” has been widely used for over half a century. It’s a mainstay for taking a quick look at the quality of an estimator of a sample. (An estimator is a summary function over a sample, such as its mean or variance.)

Suppose we have a noisy sample. Our first stopping point might be to look at the variance of the sample, to get a sense of how much the values in the sample “spread out” around the average.

If the variance is not close to zero, then we know that the sample is somewhat noisy. But our curiosity may persist: is the variance unduly influenced by a few big spikes, or is the sample consistently noisy? The jackknife is a simple analytic tool that lets us quickly answer questions like this. There are more accurate, sophisticated approaches to this kind of problem, but they’re not nearly so easy to understand and use, so the jackknife has stayed popular since the 1950s.

The jackknife is easy to describe. We take the original sample, drop the first value out, and calculate the variance (or whatever the estimator is) over this subsample. We repeat this, dropping out only the second value, and continue. For an original sample with n elements, we end up with a collection of n jackknifed estimates of all the subsamples, each with one element left out. Once we’re done, there’s an optional last step: we compute the mean of these jackknifed estimates, which gives us the jackknifed variance.

For example, suppose we have the sample [1,3,2,1]. (I’m going to write all my examples in Haskell for brevity, but the code in this post should be easy to port to any statistical language.)

The simplest way to compute variance is as follows:

var xs = (sum (map (^2) xs) - sum xs ^ 2 / n) / n where n = fromIntegral (length xs)

Using this method, the variance of [1,3,2,1] is 0.6875.

To jackknife the variance:

var [1,3,2,1] == 0.6875 -- leave out each element in succession -- (I'm using ".." to denote repeating expansions) var [ 3,2,1] == 0.6666.. var [1, 2,1] == 0.2222.. var [1,3, 1] == 0.8888.. var [1,3,2 ] == 0.6666.. -- compute the mean of the estimates over the subsamples mean [0.6666,0.2222,0.8888,0.6666] == 0.6111..

Since 0.6111 is quite different than 0.6875, we can see that the variance of this sample is affected rather a lot by bias.

While the jackknife is simple, it’s also slow. We can easily see that the approach outlined above takes O(n2) time, which means that we can’t jackknife samples above a modest size in a reasonable amount of time.

This approach to the jackknife is the one everybody actually uses. Nevertheless, it’s possible to improve the time complexity of the jackknife for some important estimators from O(n2) to O(n). Here’s how.

Jackknifing the mean

Let’s start with the simple case of the mean. Here’s the obvious way to measure the mean of a sample.

mean xs = sum xs / n where n = fromIntegral (length xs)

And here are the computations we need to perform during the naive approach to jackknifing the mean.

-- n = fromIntegral (length xs - 1) sum [ 3,2,1] / n sum [1, 2,1] / n sum [1,3, 1] / n sum [1,3,2 ] / n

Let’s decompose the sum operations into two triangles as follows, and see what jumps out:

sum [ 3,2,1] = sum [] + sum [3,2,1] sum [1, 2,1] = sum [1] + sum [2,1] sum [1,3, 1] = sum [1,3] + sum [1] sum [1,3,2 ] = sum [1,3,2] + sum []

From this perspective, we’re doing a lot of redundant work. For example, to calculate sum [1,3,2], it would be very helpful if we could reuse the work we did in the previous calculation to calculate sum [1,3].

Prefix sums

We can achieve our desired reuse of earlier work if we store each intermediate sum in a separate list. This technique is called prefix summation, or (if you’re a Haskeller) scanning.

Here’s the bottom left triangle of sums we want to calculate.

sum [] {- + sum [3,2,1] -} sum [1] {- + sum [2,1] -} sum [1,3] {- + sum [1] -} sum [1,3,2] {- + sum [] -}

We can prefix-sum these using Haskell’s standard scanl function.

>>> init (scanl (+) 0 [1,3,2,1]) [0,1,4,6] {- e.g. [0, 0 + 1, 0 + 1 + 3, 0 + 1 + 3 + 2] -}

(We use init to drop out the final term, which we don’t want.)

And here’s the top right of the triangle.

{- sum [] + -} sum [3,2,1] {- sum [1] + -} sum [2,1] {- sum [1,3] + -} sum [1] {- sum [1,3,2] + -} sum []

To prefix-sum these, we can use scanr, which scans “from the right”.

>>> tail (scanr (+) 0 [1,3,2,1]) [6,3,1,0] {- e.g. [3 + 2 + 1 + 0, 2 + 1 + 0, 1 + 0, 0] -}

(As in the previous case, we use tail to drop out the first term, which we don’t want.)

Now we have two lists:

[0,1,4,6] [6,3,1,0]

Next, we sum the lists pairwise, which gives get exactly the sums we need:

sum [ 3,2,1] == 0 + 6 == 6 sum [1, 2,1] == 1 + 3 == 4 sum [1,3, 1] == 4 + 1 == 5 sum [1,3,2 ] == 6 + 0 == 6

Divide each sum by n-1, and we have the four subsample means we were hoping for—but in linear time, not quadratic time!

Here’s the complete method for jackknifing the mean in O(n) time.

jackknifeMean :: Fractional a => [a] -> [a] jackknifeMean xs = map (/ n) $ zipWith (+) (init (scanl (+) 0 xs)) (tail (scanr (+) 0 xs)) where n = fromIntegral (length xs - 1)

If we’re jackknifing the mean, there’s no point in taking the extra step of computing the mean of the jackknifed subsamples to estimate the bias. Since the mean is an unbiased estimator, the mean of the jackknifed means should be the same as the sample mean, so the bias will always be zero.

However, the jackknifed subsamples do serve a useful purpose: each one tells us how much its corresponding left-out data point affects the sample mean. Let’s see what this means.

>>> mean [1,3,2,1] 1.75

The sample mean is 1.75, and let’s see which subsample mean is farthest from this value:

>>> jackknifeMean [1,3,2,1] [2, 1.3333, 1.6666, 2]

So if we left out 1 from the sample, the mean would be 2, but if we left out 3, the mean would become 1.3333. Clearly, this is the subsample mean that is farthest from the sample mean, so 3 is the most significant outlier in our estimate of the mean.

Prefix sums and variance

Let’s look again at the naive formula for calculating variance:

var xs = (sum (map (^2) xs) - sum xs ^ 2 / n) / n where n = fromIntegral (length xs)

Since this approach is based on sums, it looks like maybe we can use the same prefix summation technique to compute the variance in O(n) time.

Because we’re computing a sum of squares and an ordinary sum, we need to perform two sets of prefix sum computations:

  • Two to compute the sum of squares, one from the left and another from the right

  • And two more for computing the square of sums

jackknifeVar xs = zipWith4 var squaresLeft squaresRight sumsLeft sumsRight where var l2 r2 l r = ((l2 + r2) - (l + r) ^ 2 / n) / n squares = map (^2) xs squaresLeft = init (scanl (+) 0 squares) squaresRight = tail (scanr (+) 0 squares) sumsLeft = init (scanl (+) 0 xs) sumsRight = tail (scanr (+) 0 xs) n = fromIntegral (length xs - 1)

If we look closely, buried in the local function var above, we will see almost exactly the naive formulation for variance, only constructed from the relevant pieces of our four prefix sums.

Skewness, kurtosis, and more

Exactly the same prefix sum approach applies to jackknifing higher order moment statistics, such as skewness (lopsidedness of the distribution curve) and kurtosis (shape of the tails of the distribution).

Numerical accuracy of the jackknifed mean

When we’re dealing with a lot of floating point numbers, the ever present concerns about numerical stability and accuracy arise.

For example, suppose we compute the sum of ten million pseudo-qrandom floating point numbers between zero and one.

The most accurate way to sum numbers is by first converting them to Rational, summing, then converting back to Double. We’ll call this the “true sum”. The standard Haskell sum function (“basic sum” below) simply adds numbers as it goes. It manages 14 decimal digits of accuracy before losing precision.

true sum: 5000754.656937315 basic sum: 5000754.65693705 ^

However, Kahan’s algorithm does even better.

true sum: 5000754.656937315 kahan sum: 5000754.656937315

If you haven’t come across Kahan’s algorithm before, it looks like this.

kahanStep (sum, c) x = (sum', c') where y = x - c sum' = sum + y c' = (sum' - sum) - y

The c term maintains a running correction of the errors introduced by each addition.

Naive summation seems to do just fine, right? Well, watch what happens if we simply add 1010 to each number, sum these, then subtract 1017 at the end.

true sum: 4999628.983274754 basic sum: 450000.0 kahan sum: 4999632.0 ^

The naive approach goes completely off the rails, and produces a result that is off by an order of magnitude!

This catastrophic accumulation of error is often cited as the reason why the naive formula for the mean can’t be trusted.

mean xs = sum xs / n where n = fromIntegral (length xs)

Thanks to Don Knuth, what is usually suggested as a replacement is Welford’s algorithm.

import Data.List (foldl') data WelfordMean a = M !a !Int deriving (Show) welfordMean = end . foldl' step zero where end (M m _) = m step (M m n) x = M m' n' where m' = m + (x - m) / fromIntegral n' n' = n + 1 zero = M 0 0

Here’s what we get if we compare the three approaches:

true mean: 0.49996289832747537 naive mean: 0.04500007629394531 welford mean: 0.4998035430908203

Not surprisingly, the naive mean is worse than useless, but the long-respected Welford method only gives us three decimal digits of precision. That’s not so hot.

More accurate is the Kahan mean, which is simply the sum calculated using Kahan’s algorithm, then divided by the length:

true mean: 0.49996289832747537 kahan mean: 0.4999632 welford mean: 0.4998035430908203

This at least gets us to five decimal digits of precision.

So is the Kahan mean the answer? Well, Kahan summation has its own problems. Let’s try out a test vector.

-- originally due to Tim Peters >>> let vec = concat (replicate 1000 [1,1e100,1,-1e100]) -- accurate sum >>> sum (map toRational vec) 2000 -- naive sum >>> sum vec 0.0 -- Kahan sum >>> foldl kahanStep (S 0 0) vec S 0.0 0.0

Ugh, the Kahan algorithm doesn’t do any better than naive addition. Fortunately, there’s an even better summation algorithm available, called the Kahan-Babuška-Neumaier algorithm.

kbnSum = uncurry (+) . foldl' step (0,0) where step (sum, c) x = (t, c') where c' | abs sum >= abs x = c + ((sum - t) + x) | otherwise = c + ((x - t) + sum) t = sum + x

If we try this on the same test vector, we taste sweet success! Thank goodness!

>>> kbnSum vec 2000.0

Not only is Kahan-Babuška-Neumaier (let’s call it “KBN”) more accurate than Welford summation, it has the advantage of being directly usable in our desired prefix sum form. We’ll accumulate floating point error proportional to O(1) instead of the O(n) that naive summation gives.

Poor old Welford’s formula for the mean just can’t get a break! Not only is it less accurate than KBN, but since it’s a recurrence relation with a divisor that keeps changing, we simply can’t monkeywrench it into suitability for the same prefix-sum purpose.

Numerical accuracy of the jackknifed variance

In our jackknifed variance, we used almost exactly the same calculation as the naive variance, merely adjusted to prefix sums. Here's the plain old naive variance function once again.

var xs = (sum (map (^2) xs) - sum xs ^ 2 / n) / n where n = fromIntegral (length xs)

The problem with this algorithm arises as the size of the input grows. These two terms are likely to converge for large n:

sum (map (^2) xs) sum xs ^ 2 / n

When we subtract them, floating point cancellation leads to a large error term that turns our result into nonsense.

The usual way to deal with this is to switch to a two-pass algorithm. (In case it’s not clear at first glance, the first pass below calculates mean.)

var2 xs = (sum (map (^2) ys) - sum ys ^ 2 / n) / n where n = fromIntegral (length xs) ys = map (subtract (mean xs)) xs

By subtracting the mean from every term, we keep the numbers smaller, so the two sum terms are less likely to converge.

This approach poses yet another conundrum: we want to jackknife the variance. If we have to correct for the mean to avoid cancellation errors, do we need to calculate each subsample mean? Well, no. We can get away with a cheat: instead of subtracting the subsample mean, we subtract the sample mean, on the assumption that it’s “close enough” to each of the subsample means to be a good enough substitute.

So. To calculate the jackknifed variance, we use KBN summation to avoid a big cumulative error penalty during addition, subtract the sample mean to avoid cancellation error when subtracting the sum terms, and then we’ve finally got a pretty reliable floating point algorithm.

Where can you use this?

The jackknife function in the Haskell statistics library uses all of these techniques where applicable, and the Sum module of the math-functions library provides reliable summation (including second-order Kahan-Babuška summation, if you gotta catch all those least significant bits).

(If you’re not already bored to death of summation algorithms, take a look into pairwise summation. It’s less accurate than KBN summation, but claims to be quite a bit faster—claims I found to be only barely true in my benchmarks, and not worth the loss of precision.)

Categories: Offsite Blogs

JOB: AHRC Doctoral Studentship in ComputationalMusicology

haskell-cafe - Tue, 06/10/2014 - 4:00pm
STUDENTSHIP ADVERTISEMENT With apologies for cross posting AHRC Doctoral Studentship in Computational Musicology http://www.transforming-musicology.org/news/2014-06-03_ahrc-doctoral-studentship-in-computational-musicology/ London, UK (with UK residency requirement unfortunately) Although this is not an explicitly Haskell programming job, the student would be free to work in whatever language they wish. Furthermore, there would be encouragement to consider a functional and/or logic programming approach to research. Award: fees and tax-free stipend at £15,726 p.a. (inc. of London weighting) Application deadline: Tuesday 1 July 2014 Expected start date: October 2014 We invite applications for a Doctoral Studentship, funded by the Arts and Humanities Research Council, in Computational Musicology, located at Queen Mary University of London, under the supervision of Professor Geraint Wiggins. The studentship is part of the "Transforming Musicology" project, including Goldsmiths, University of London, Quee
Categories: Offsite Discussion

System.Directory.removeDirectoryRecursive and symlinks

libraries list - Tue, 06/10/2014 - 2:42pm
Hi all, A crime scene: Prelude System.Directory> :!mkdir a-directory Prelude System.Directory> :!touch a-directory/a-file.txt Prelude System.Directory> :!ln -s "a-directory" "a-symlink-to-a-directory" Prelude System.Directory> :!ls a-directory a-file.txt Prelude System.Directory> :!ls a-symlink-to-a-directory a-file.txt Prelude System.Directory> removeDirectoryRecursive "a-symlink-to-a-directory" *** Exception: a-symlink-to-a-directory: removeDirectory: inappropriate type (Not a directory) Prelude System.Directory> :!ls a-symlink-to-a-directory Prelude System.Directory> :!ls a-directory Prelude System.Directory> :!ls -a a-directory ... Prelude System.Directory> :!ls -a a-symlink-to-a-directory ... Prelude System.Directory> removeDirectoryRecursive is removing all contents *of the directory linked* but is unable to remove the symlink itself. This behavior is surprizing if not dangerous. I understand that this mimics behavior of unlink/rmdir and DeleteFile/RemoveDirecto
Categories: Offsite Discussion

Call for participation: HLPP 2014 - 7th Symposium on High-Level Parallel Programming and Applications

General haskell list - Mon, 06/09/2014 - 10:29pm
=========================================================================== CALL FOR PARTICIPATION HLPP 2014 7th International Symposium on High-level Parallel Programming and Applications Amsterdam, Netherlands July 3-4, 2014 https://sites.google.com/site/hlpp2014amsterdam/ =========================================================================== Early registration deadline: June 16 =========================================================================== Keynote: Frank Schlimbach, Intel: Design and Implementation of Intel Concurrent Collections (CnC) Abstract: CnC (concurrent collections) is a parallel language in the sense that its goal is parallel execution but in fact a CnC program does not indicate what runs in parallel. Instead, it identifies what precludes parallel execution. There are exactly two reasons that computations cannot execute i
Categories: Incoming News

New gtk2hs 0.12.4 release

gtk2hs - Wed, 11/21/2012 - 12:56pm

Thanks to John Lato and Duncan Coutts for the latest bugfix release! The latest packages should be buildable on GHC 7.6, and the cairo package should behave a bit nicer in ghci on Windows. Thanks to all!

~d

Categories: Incoming News