Summary: Failing properties should throw exceptions, not return False.
main = quickCheck $ \x -> exp (log (abs x)) == abs (x :: Double)
Here we are checking that for all positive Double values, applying log then exp is the identity. This statement is incorrect for Double due to floating point errors. Running main we get:*** Failed! Falsifiable (after 6 tests and 2 shrinks):
QuickCheck is an incredibly powerful tool - it first found a failing example, and then automatically simplified it. However, it's left out some important information - what is the value of exp (log 3)? For my tests I usually define === and use it instead of ==:a === b | a == b = True
| otherwise = error $ show a ++ " /= " ++ show b
Now when running main we get:*** Failed! Exception: '3.0000000000000004 /= 3.0'
(after 2 tests and 2 shrinks):
We can immediately see the magnitude of the error introduced, giving a kick-start to debugging.
Do we still need Bool?
When writing functions returning Bool there are three interesting cases:
- The function returns True, the test passes, continue on.
- The function returns False, the test fails, with no information beyond the input arguments.
- The function throws an exception, the test fails, but with a human readable error message about the point of failure.
Of these, returning False is the least useful, and entirely subsumed by exceptions. It is likely there was additional information available before reducing to False, which has now been lost, and must first be recovered before debugging can start.
Given that the only interesting values are True and exception, we can switch to using the () type, where passing tests return () and failing tests throw an exception. However, working with exceptions in pure code is a bit ugly, so I typically define:import Control.Monad
(===) :: (Show a, Eq a) => a -> a -> IO ()
a === b = when (a /= b) $ error $ show a ++ " /= " ++ show b
Now === is an action, and passing tests do nothing, while failing tests raise an error. This definition forces all tests to end up in IO, which is terrible for "real" Haskell code, where pure and IO computations should be segregated. However, for tests, as long as the test is repeatable (doesn't store some temporary state) then I don't worry.
To test the IO () returning property with QuickCheck we can define:instance Testable () where
property () = property True
instance Testable a => Testable (IO a) where
property = property . unsafePerformIO
Now QuickCheck can work with this === definition, but also any IO property. I have found that testing IO properties with QuickCheck is very valuable, even without using ===.
Whilst I have argued for exceptions in the context of QuickCheck tests, I use the same exception pattern for non-parameterised/non-QuickCheck assertions. As an example, taken from Shake:test = do
dropDirectory1 "aaa/bbb" === "bbb"
dropDirectory1 "aaa/" === ""
dropDirectory1 "aaa" === ""
dropDirectory1 "" === ""
Using IO () properties allows for trivial sequencing. As a result, the tests are concise, and the effort to add additional tests is low.
(===) in QuickCheck
In QuickCheck-2.7 the === operator was introduced (which caused name clashes in both Shake and Hoogle - new versions of both have been released). The === operator uses the Property data type in QuickCheck to pass both the Bool value and additional information together. I prefer my definition of === because it's simpler, doesn't rely on QuickCheck and makes it clear how to pass additional information beyond just the arguments to ===. As an example, we could also return the log value:\x -> when (exp (log (abs x)) /= abs x) $
error $ show (x,log $ abs x, exp $ log $ abs x)
However, the QuickCheck === is a great improvement over ==, and should be a very popular addition.
I have always had the sense that the word "purity" is used in a pretty informal sense, and when I provoke people to give a definition I've never been entirely satisfied.
Folks often point to the notion of referential transparency. There is a complaint that Strachey's introduction of the term was perhaps poorly explained considering the original meaning c.f. Quine. I'm not as interested in that discussion and I'm content to leave RT mean something to do with having an equational theory of terms.
A good reference on the topic seems to be "What is a Purely Functional Language?" (Sabry 1998). In this article, Sabry argues that purity should be taken as property of a language, not of an expression. RT is specifically discounted as a plausible definition of purity.
For the record, their definition of purity is:
A language is purely functional if (i) it includes every simply typed lambda calculus term, and (ii) its call-by-name, call-by-need, and call-by-value implementations are equivalent (modulo divergence and errors).
This definition is more satisfying than RT because I don't see how "replace equals with equals" could ever sensibly be violated without being obtuse. Indeed, Strachey's point is that when considering, say, "let x = y++" you cannot consider "x" and "y++" to be equals. Any replacement of "y++" with "x" is just ignorance of what the term "y++" actually equals, not a failure of referential transparency (a point better expressed in a blog post by Magnus).
However, Sabry's definition doesn't totally agree with the informal usage of the term pure. For example, one can no longer say "prefer pure functions" because purity applies to languages as a whole, not portions of specific programs. A monolithic Haskell program written entirely in the IO monad can't be seen as somehow "less pure."
I would be interested if there are other papers that refine or dispute this definition. Specifically, is there a technical term that better captures this informal idea of "not creating abstractions that are functions of the of the rest of the program?" This is what I think people often mean when they advocate for the advantages of purity.submitted by takeoutweight
[link] [18 comments]
I have very often heard that Haskell is a very good language in which to write compilers, but have been unable to find resources or tutorials about this, besides the excellent Write You a Scheme.
I am currently working through Appel's "Modern Compiler Implementation in ML" in Haskell, and I think I am doing OK, but would like to know if I am doing it the "proper" way, specially concerning features of ML he uses, that are ugly, not available, or not idiomatic in Haskell.submitted by TunaOfDoom
[link] [14 comments]
The foldl function is broken. Everyone knows it’s broken. It’s been broken for nearly a quarter of a century. We should finally fix it!
Today I am proposing that Prelude.foldl be redefined using the implementation currently known as Data.List.foldl'.foldl is broken!
I’m sure you knew that already, but just in case…
Have you ever noticed that Haskellers usually recommend using either foldr or foldl' but not foldl? For example Real World Haskell has this to say:
Due to the thunking behaviour of foldl, it is wise to avoid this function in real programs: even if it doesn’t fail outright, it will be unnecessarily inefficient. Instead, import Data.List and use foldl'.
In the online version of the book the first user comments on that paragraph are
Why isn’t Data.List foldl implementation placed in Prelude?
I second the question: Why isn’t foldl' the default?
Ok, so obviously we’re talking about the difference between foldl and foldl':foldl :: (b -> a -> b) -> b -> [a] -> b foldl f a  = a foldl f a (x:xs) = foldl f (f a x) xs foldl' :: (b -> a -> b) -> b -> [a] -> b foldl' f a  = a foldl' f a (x:xs) = let a' = f a x in a' `seq` foldl' f a' xs
The dry technical difference is that foldl' evaluates the call to f before making the next recursive call. The consequences of that are perhaps not immediately obvious, so we’ll take a step back and look at a slightly bigger picture.Folding left and right
When we first learn Haskell we learn that there are two ways to fold a list, from the left or right.foldl f z [x1, x2, ..., xn] = (...((z `f` x1) `f` x2) `f`...) `f` xn foldr f z [x1, x2, ..., xn] = x1 `f` (x2 `f` ... (xn `f` z)...)
Saying “from the left” or “from the right” is a description of what foldl and foldr calculate, with the parenthesis nesting to the left or to the right. At runtime of course we always have to start from the left (front) of the list.
We later learn other ways of thinking about left and right folds, that a left fold can be used like a classic loop where we go through the whole list eagerly, while a right fold can be used like a demand-driven iterator. For the left fold that means using a function that is strict in its first argument (like (+)) while for the right fold that means using a function that is not strict in its second argument (like (:)).
Indeed when looking at whether we want foldl or foldr in any particular case our choice is usually governed by whether we want “all at once” behaviour (foldl) or if we want incremental or short-cut behaviour (foldr).Accumulating thunks
Again, as we are learning Haskell, we get told that foldl has this crazy behaviourfoldl (+) 0 (1:2:3:) = foldl (+) (0 + 1) (2:3:) = foldl (+) ((0 + 1) + 2) (3:) = foldl (+) (((0 + 1) + 2) + 3)  = (((0 + 1) + 2) + 3)
when what we had in mind when we thought of an accumulating loop wasfoldl' (+) 0 (1:2:3:) = foldl' (+) 1 (2:3:) = foldl' (+) 3 (3:) = foldl' (+) 6  = 6
Of course that’s just what foldl' does, it evaluates the call to + before making the next recursive call.When is foldl (rather than foldl') useful?
The short answer is “almost never”.
As beginners we often assume that foldl must still make sense in some cases (or why have both?) but it turns out that’s not really the case.
When the f argument of foldl is a strict function then delaying the evaluation does not gain us anything as it all has to be evaluated at the end anyway. The only time when delaying the evaluation could save us anything is when the f function is not strict in its first argument – in which case you either don’t care or probably should have been using foldr in the first place.
In fact even if our f function is non-strict in its first argument, we probably do not gain anything from delaying evaluation and it usually does no harm to evaluate earlier. Remember that we still have to traverse the whole list, we don’t get any short-cutting behaviour like with foldr.
We can, if we think about it, construct examples where foldl' would be too strict. We could define last and last' like this:last = foldl (\_ y -> y) (error "empty list") last' = foldl' (\_ y -> y) (error "empty list")
Now if we try> last [1,undefined,3] 3 > last' [1,undefined,3] *** Exception: Prelude.undefined
This is because our accumulator is always the latest element that we’ve seen but we don’t actually want to evaluate the elements (except the last one).
So it’s true that foldl' fails in this case, but it’s also a silly definition, the usual definition is a lot clearerlast [x] = x last (_:xs) = last xs last  = error "empty list"
That goes for pretty much all the other examples you might be able to think of where foldl would work but foldl' would not: the examples are either artificial or are clearer written in other ways.
People sometimes point out that sum is defined using foldl and not foldl' and claim that this is something to do with Haskell’s designers wanting to allow Num instances where (+) might be lazy. This is pretty much nonsense. If that were the case then sum would have been defined using foldr rather than foldl to properly take advantage of short-cutting behaviour. A much simpler explanation is that foldl' was not available in early versions of Haskell when sum was defined.
In nearly 15 years as a Haskell programmer I think I’ve specifically needed foldl rather than foldl' about three times. I say “about” because I can only actually remember one. That one case was in a slightly cunning bit of code for doing cache updates in a web server. It would almost certainly have been clearer as a local recursion but I was amused to find a real use case for foldl and couldn’t help myself from using it just for fun. Of course it needed a comment to say that I was using it on purpose rather than by mistake!So why do we have foldl and foldl'?
If foldl is almost always a mistake (or merely benign) then why do we have it in the first place?
I don’t know for sure, but here’s my guess…
When Haskell 1.0 was published on this day 24 years ago there was no seq function at all, so there was no choice but to define foldl in the “classic” way.
Eventually, six years later after much discussion, we got the seq function in Haskell 1.3. Though actually in Haskell 1.3 seq was part of an Eval class, so you couldn’t use it just anywhere, such as in foldl. In Haskell 1.3 you would have had to define foldl' with the type:foldl' :: Eval b => (b -> a -> b) -> b -> [a] -> b
Haskell 1.4 and Haskell 98 got rid of the Eval class constraint for seq but foldl was not changed. Hugs and GHC and other implementations added the non-standard foldl'.
I suspect that people then considered it a compatibility and inertia issue. It was easy enough to add a non-standard foldl' but you can’t so easily change the standard.
I suspect that if we had had seq from the beginning then we would have defined foldl using it.
Miranda, one of Haskell’s predecessor languages, already had seq 5 years before Haskell 1.0.A strict foldl in Orwell!
Orwell is an interesting case. Orwell was another Haskell predecessor, very similar to Miranda and early Haskell. An informed source told me that Orwell had defined its foldl in the way that we now define foldl', ie with strict evaluation. Information on Orwell is a little hard to get ahold of online these days so I asked Philip Wadler. Phil very kindly fished out the manuals and looked up the definitions for me.
In the original version:An Introduction to Orwell (DRAFT) Philip Wadler 1 April 1985
In the standard prelude:lred f a  = a lred f a (x:xs) = lred f (f a x) xs
But five years later, by the time Haskell 1.0 is being published…An Introduction to Orwell 6.00 by Philip Wadler revised by Quentin Miller Copyright 1990 Oxford University Computing Lab
In the standard prelude:foldl :: (a -> b -> a) -> a -> [b] -> a foldl f a  = a foldl f a (x:xs) = strict (foldl f) (f a x) xs
Note the use of strict. Presumably Orwell’s strict function was defined as (or equivalent to)strict :: (a -> b) -> a -> b strict f x = x `seq` f x
(These days in Haskell we call this function ($!).)
So my source was right, Orwell did change foldl to be the strict version!
I contend that this was and is the right decision, and that it was just a consequence of the late arrival of seq in Haskell and inertia and fears about backwards compatibility that have kept us from fixing foldl.Just do it!
It’d help all of us who are sometimes tempted to use foldl because we can’t be bothered to import Data.List. It’d help confused beginners. It’d save teachers from the embarrassment of having to first explain foldl and then why you should never use it.
Orwell fixed this mistake at least 24 years ago, probably before Haskell 1.0 was released. Just because it’s an old mistake doesn’t mean we shouldn’t fix it now!A postscript: which foldl'?
I hate to complicate a simple story but I should admit that there are two plausible definitions of foldl' and I’ve never seen any serious discussion of why we use one rather than the other (I suspect it’s another historical accident).
So the version above is the “standard” version, perhaps more clearly written using bang patterns asfoldl' :: (b -> a -> b) -> b -> [a] -> b foldl' f a  = a foldl' f a (x:xs) = let !a' = f a x in foldl' f a' xs
But an equally plausible version isfoldl'' :: (b -> a -> b) -> b -> [a] -> b foldl'' f !a  = a foldl'' f !a (x:xs) = foldl'' f (f a x) xs
The difference is this: in the first version we evaluate the new accumulating parameter before making the recursive call, while in the second version we ensure that the accumulating parameter is always evaluated (to WHNF) on entry into each call.
These two ways of forcing the evaluation have almost the same effect. It takes a moment or two to find a case where it makes a difference, but here’s one:foldl' (\_ y -> y) undefined  = 1 foldl'' (\_ y -> y) undefined  = undefined
The standard foldl' ensures that all the new accumulating parameter values are evaluated, but still allows the initial value to be unevaluated. The alternative version simply says that the accumulating parameter is always evaluated.
The second version is attractive from a code generation point of view. One of the clever things that GHC can do with foldl' (and strict function arguments generally) is to unbox the values, so for example an Int can be unboxed to a Int# and passed in registers rather than on the heap. With the standard foldl' it needs a special case for the first iteration of the loop where the initial value of the accumulating parameter which might not be evaluated yet. With foldl'', that’s not a problem, we can unbox things right from the start. In practice, GHC can often see that the initial value is evaluated anyway, but not always.
(Don Stewart and I noticed this a few years back when we were working on stream fusion for lists. We had defined foldl' on streams in a way that corresponded to the second form above and then got a test failure when doing strictness testing comparing against the standard list functions.)
So if we’re going to fix foldl to be the strict version, then perhaps it should be the fully strict version, not just the “strict after the first iteration” version.
- Fully separated cycle and pedestrian routes on all arterial roads.
- Restrictions on traffic speeds, parking, access etc on all residential roads
- Adopt ‘strict liability’ on roads to protect the most vulnerable road users
- Changes to structure of cities to make accessing services by bike easy, and storing and parking bikes easy
- Societal and economic changes to give people flexibility to travel more sustainably (flexi hours, school provision etc)
- Change the image of cycling so that it becomes ‘normal’
The first recommendation is for separated cycle routes, as found in Copenhagen, Amsterdam, and elsewhere. ("Copenhagenize" is now a verb.) To Edinburgh's credit, it has allocated 7% of its transport budget to cycling. How much of that is going to separated cycling routes, and the other recommendations on the list? Councillor Burns says segregated routes are planned for George Street and Leith Walk. I argued we need a more agressive plan. Listening to Pooley, the problems sound difficult, but all we really need are the money and the will. Let's construct a network of separated cycle routes covering the city. Build it, and they will come!
(Above: Edinburgh Links, one of the few segregated bike routes in the city, and my ride to work each morning. Below: Copenhagen.)