News aggregator

Daunting heap profile when using (<>)

haskell-cafe - Sat, 11/14/2015 - 12:10pm
Hello all, I have a Logger which produces log entries and a new version of itself newtype Logger evt dom log = Lgr {runLogger :: Timed evt -> dom -> (log, Logger evt dom log)} Loggers are used in a function like this runSim :: (Ord evt, Monoid log) => SimBehaviour evt dom log -> SimState evt dom log -> SimState evt dom log runSim (!lgr, !hdr, xtp) (!log,!dom,!evq) = case step of Nothing -> (log, dom, evq) -- end of simulation Just (newEvq, newDom, newHdr, newLgr, newLog) -> runSim (newLgr,newHdr,xtp) (newLog,newDom,newEvq) where -- check for end conditions or run handler step = do (evt, evts) <- H.view evq -- no more Events -> Nothing if xtp (evt,dom) then Nothing else let (evq', dom', hdr') = runHandler hdr evt dom (log',lgr') = runLogger lgr evt dom' -- <-- -- append new event and new log entries
Categories: Offsite Discussion

Why don't Num/IsList/IsString use Template Haskell?

Haskell on Reddit - Sat, 11/14/2015 - 12:00pm

Why isn't there a {-# LANGUAGE NumTH -#} extension that replaces numbers with $$(fromIntegerTH <number>) instead of (fromInteger <number>), and same for IsList and IsString, so that invalid values (eg. for Natural) are caught at compile time? Like this:

{-# LANGUAGE TemplateHaskell #-} module NumTH where import Language.Haskell.TH.Syntax (Lift) import Language.Haskell.TH.Lib (TExpQ) fromIntegerTH :: (Num a, Lift a) => Integer -> TExpQ a fromIntegerTH i = let n = fromInteger i in [|| n ||] ----------------------------------------------------------- {-# LANGUAGE TemplateHaskell #-} module Main where import Numeric.Natural import NumTH main = print ($$(fromIntegerTH (-1)) :: Natural) ----------------------------------------------------------- Main.hs:8:18: Exception when trying to run compile-time code: arithmetic underflow Code: fromIntegerTH (- 1) In the Template Haskell splice $$(fromIntegerTH (- 1)) In the first argument of ‘print’, namely ‘($$(fromIntegerTH (- 1)) :: Natural)’ In the expression: print ($$(fromIntegerTH (- 1)) :: Natural) submitted by lamefun
[link] [comment]
Categories: Incoming News

Adopt a library - hsqml-transient

Haskell on Reddit - Sat, 11/14/2015 - 11:42am

hsqml-transient is library that is almost finished, yet unfortunately subtly broken, which I no longer need (Long story short, I've decided to end the hopeless ordeal of trying to deploy a GUI application on Windows by buying some server space and running a jupyter notebook off it. As a result, I don't think I'd be writing any traditional GUI application any time soon), so I probably won't get around to fix it.

What works: everything but signals declared in member objects. You can write code like this (de-inverted control flow courtesy of transient:

newtype Foo = Foo { getText :: Text } toplevel :: Build () toplevel = do _ <- propertySelf "self" foo <- property "foo" Text.empty reacting (get foo) (putStrLn . Text.unpack) member "bar" $ do txt <- autoMethod_ "setText" (pure.getText) reacting txt (put foo) main = runQml (fileDocument "/path/to/some/file.qml") toplevel

This program creates a context object with a text property "foo" that can be both read and written from the QML side, printing its value to stdout each time it's changed. A member object "bar" (accessible as "") has a method that sets the text and executes a callback (the difference between the callback passed to method functions and regular ones is that it's type must be IO and it's executed in the main HsQML thread).

There are some kludges peculiarities of the library already visible in this example:

  • The Foo newtype is needed, because autoMethod_ "wraps" method arguments to construct a Haskell type. That's why the argument type must be Generic, and a plain Text doesn't fit.
  • reacting has type React a -> (a -> Qml a) -> Build (). The React type is more or less TransientIO, intentionally "crippled" to be just an Applicative. This way the type system ensures that all combined actions are treated equally, and a change of any component retriggers the callback.

Anyway, the reason signals fired from member objects don't work is that the Qml action returned by signal is always given an ObjRef to the context object, rather than the object it was defined in. The places that need to be fixed are:

  • Qml type in Graphics.QML.Transient.Internal.Types, which is currently defined as ReaderT (ObjRef ()) TransientIO. Probably some different mechanism of providing the ObjRef would be necessary.
  • signal in Graphics.QML.Transient.Signal. This is the only place in the whole library that ever asks for the ObjRef. Unfortunately, pushing the ask up into the Build layer would require the ObjRef to be present before it's actually constructed. I'm not sure if laziness or some more clever use of Reader could help.
  • Finally, member from Graphics.QML.Transient drops the ObjRef of a member object on the floor right after creating the property in the parent object. It would need to make it available to signal somehow.

If anyone finds this kind of a library useful, feel free to fork, fix, and upload it to Hackage. Wish you luck :) I only have to warn you that hsqml seems to be rather neglected, as it has stale dependencies for a couple of months already. You'd most likely have to bug /u/komadori to fix that.

submitted by mjmrotek
[link] [4 comments]
Categories: Incoming News

Sanity check on my approach to adding a Transport instance to Thrift's Haskell library

Haskell on Reddit - Sat, 11/14/2015 - 9:04am

Hi everyone,

This is probably an easy problem (if it even truly is one) for someone here, so I thought I'd ask before diving too far into my own solution. I'm trying to use the Haskell Thrift interface, and while it makes GHC.IO.Handle an instance of Transport, this only allows (de)serialization with a plain file, while I need it to work with gzipped files. You can see the Transport class here:

so, I start by wrapping an arbitrary Transport, without modifying its functionality, something like this:

data ZlibTransport t = ZlibTransport { wrappedTrans :: t } deriving (Show) instance Transport t => Transport (ZlibTransport t) where tClose = tClose . wrappedTrans tIsOpen = tIsOpen . wrappedTrans tFlush = tFlush . wrappedTrans tRead = tRead . wrappedTrans tPeek = tPeek . wrappedTrans tWrite = tWrite . wrappedTrans

Now, I need to replace some of these functions with Zlib functionality: I have tWrite working (though perhaps sub-optimally), but I think tRead will be more, there's the underlying Transport (T) that is providing compressed data, and the ZlibTransport (ZT) that needs to provide uncompressed data. When someone requests data from ZT, ZT needs to read enough data from T to get a decompressible chunk, enough times to satisfy the request: so there's the danger of reading too much compressed data from T (and so I think there needs to be a buffer to save the leftovers), and of having too much uncompressed data for the request (so, another buffer to save these leftovers).

Thrift.Transport.IOBuffer has utility code for the buffers, and I was thinking Codec.Compression.Zlib.Internal.DecompressStream IO could be how I track the availability of decompressed/need for compressed data.

I guess my question is, do these ingredients make sense? I'm a fairly basic Haskeller and worry that there's some incredibly simple way to do this I'm missing, or something more general, like making Lazy.ByteString a direct instance of Transport...


submitted by logical_space
[link] [6 comments]
Categories: Incoming News

test coverage of Parsec parsers

haskell-cafe - Sat, 11/14/2015 - 6:14am
Hi, I'm adding HUnit tests to a parser that uses Parsec, and I wanted to see test coverage of my parser. I've successfully used hpc to get a coverage report, but it has a significant deficiency: An expression in my parser will show as covered if the expression is ever evaluated, but I'd really like to see branch coverage of the parser monads. For example, if I have a single test that parses the string "*" with the following parser: plusOrStar = do result <- (string "+" <|> string "*") _ <- (many $ string " ") return result then hpc will report that all the code is covered, but I'd like to be able to see the fact that (string "+") only ever failed, and that (string "*") only ever succeeded. Are there any existing tools for getting that sort of coverage report for parsers or monads? Is there an approach other than hpc that I should be considering to get confidence in my test suite? If not, I understand that hpc works by doing source-code transformation--does it seem like a r
Categories: Offsite Discussion

Luke Plant: We need less powerful languages

Planet Haskell - Sat, 11/14/2015 - 5:46am

Many systems boast of being ‘powerful’, and it sounds difficult to argue that this is a bad thing. Almost everyone who uses the word assumes that it is always a good thing.

The thesis of this post is that in many cases we need less powerful languages and systems.

Before I get going, I should say first of all that there is very little in this post by way of original insight. The train of thought behind it was set off by reading Hofstadter’s book Gödel, Escher, Bach — an Eternal Golden Braid which helped me pull together various things in my own thinking where I’ve seen the principle in action. Philip Wadler’s post on the rule of least power was also formative, and most of all I’ve also taken a lot from the content of this video from a Scala conference about everything that is wrong with Scala, which makes the following fairly central point:

Every increase in expressiveness brings an increased burden on all who care to understand the message.

My aim is simply to illustrate this point using examples that might be more accessible to the Python community than the internals of a Scala compiler.

I also need a word about definitions. What do we mean by “more powerful” or “less powerful” languages? In this article, I mean something roughly like this: “the freedom and ability to do whatever you want to do”, seen mainly from the perspective of the human author entering data or code into the system. This roughly aligns with the concept of “expressiveness”, though not perhaps with a formal definition. (More formally, many languages have equivalent expressiveness in that they are all Turing complete, but we still recognise that some are more powerful in that they allow a certain outcome to be produced with fewer words or in multiple ways, with greater freedoms for the author).

The problem with this kind of freedom is that every bit of power you insist on have when writing in the language corresponds to power you must give up at other points of the process — when ‘consuming’ what you have written. I’ll illustrate this with various examples which range beyond what might be described as programming, but have the same principle at heart.

We’ll also need to ask “Does this matter?” It matters, of course, to the extent that you need to be able to ‘consume’ the output of your system. Different players who might ‘consume’ the message are software maintainers, compilers and other development tools, which means you almost always care — this has implications both for performance and correctness as well as human concerns.

Databases and schema

Starting at the low end of the scale in terms of expressiveness, there is what you might call data rather than language. But both “data” and “language” can be thought of as “messages to be received by someone”, and the principle applies here.

In my years of software development, I’ve found that clients and users often ask for “free text” fields. A free text field is maximally powerfully as far as the end user is concerned — they can put whatever they like in. In this sense, this is the “most useful” field — you can use it for anything.

But precisely because of this, it is also the least useful, because it is the least structured. Even search doesn’t work reliably because of typos and alternative ways of expressing the same thing. The longer I do software development involving databases, the more I want to tightly constrain everything as much as possible. When I do so, the data I end up with is massively more useful. I can do powerful things when consuming the data only when I severely limit the power (i.e. the freedom) of the agents putting data into the system.

In terms of database technologies, the same point can be made. Databases that are “schema-less” give you great flexibility and power when putting data in, and are extremely unhelpful when getting it out. A key-value store is a more technical version of “free text”, with the same drawbacks — it is pretty unhelpful when you want to extract info or do anything with the data, since you cannot guarantee that any specific keys will be there.


The success of the web has been partly due to the fact that some of the core technologies, HTML and CSS, have been deliberately limited in power. Indeed, you probably wouldn’t call them programming languages, but markup languages. This, however, was not an accident, but a deliberate design principle on the part of Tim Berners Lee. I can’t do better than to quote that page at length:

Computer Science in the 1960s to 80s spent a lot of effort making languages which were as powerful as possible. Nowadays we have to appreciate the reasons for picking not the most powerful solution but the least powerful. The reason for this is that the less powerful the language, the more you can do with the data stored in that language. If you write it in a simple declarative form, anyone can write a program to analyze it in many ways. The Semantic Web is an attempt, largely, to map large quantities of existing data onto a common language so that the data can be analyzed in ways never dreamed of by its creators. If, for example, a web page with weather data has RDF describing that data, a user can retrieve it as a table, perhaps average it, plot it, deduce things from it in combination with other information. At the other end of the scale is the weather information portrayed by the cunning Java applet. While this might allow a very cool user interface, it cannot be analyzed at all. The search engine finding the page will have no idea of what the data is or what it is about. This the only way to find out what a Java applet means is to set it running in front of a person.

This is has become a W3C principle:

Good Practice: Use the least powerful language suitable for expressing information, constraints or programs on the World Wide Web.

Note that this is almost exactly the opposite of Paul Graham’s advice (with the caveat that 'power' is often too informally defined to compare):

if you have a choice of several languages, it is, all other things being equal, a mistake to program in anything but the most powerful one. Python file

Moving up towards ‘proper’ programming language, I came across this example — the file format used by distutils/setuptools. If you have had to create a package for a Python library, you may well have used it.

The file format is essentially a very small language for defining what files should be included in your Python package (relative to the file, which we’ll call the working directory from now on). It might look something like this:

include README.rst recursive-include foo *.py recursive-include tests * global-exclude *~ global-exclude *.pyc prune .DS_Store

There are two types of directive: include type directives (include, recursive-include, global-include and graft), and exclude type directives (exclude, recursive-exclude, global-exclude and prune).

There comes a question — how are these directives to be interpreted (i.e. what are the semantics)?

You could interpret them in this way:

A file from the working directory (or sub-directories) should be included in the package if it matches at least one include type directive, and does not match any exclude type directive.

This would make it a declarative language.

Unfortunately, that is not how the language is defined. The distutils docs for are specific about this — the directives are to be understood as follows (my paraphrase):

  1. Start with an empty list of files to include in the package (or technically, a default list of files).
  2. Go down the directives in the in order.
  3. For every include type directive, copy any matching files from the working directory to the list for the package.
  4. For every exclude type directive, remove any matching files from the list for the package.

As you can see, this interpretation defines a language that is imperative in nature — each line of is a command that implies an action with side effects.

The point to note is that this makes the language more powerful than my speculative declarative version above. For example, consider the following:

recursive-include foo * recursive-exclude foo/bar * recursive-include foo *.png

The end result of the above commands is that .png files that are below foo/bar are included, but all other files below foo/bar are not. If I’m thinking straight, to replicate the same result using the declarative language is harder — you would have to do something like the following, which is obviously sub-optimal:

recursive-include foo * recursive-exclude foo/bar *.txt *.rst *.gif *.jpeg *.py ...

So, because the imperative language is more powerful, there is a temptation to prefer that one. However, the imperative version comes with significant drawbacks:

  1. It is much harder to optimise.

    When it comes to interpreting the and building a list of files to include in the package, one fairly efficient solution for a typical case is to first build an immutable list of all files in the directory and its sub-directories, and then apply the rules: addition rules involve copying from the full list to an output list, and subtraction rules involve removing from the output list. This is how the Python implementation currently does it.

    This works OK, unless you have many thousands of files in the full list, most of which are going to get pruned or not included, in which case you can spend a lot of time building up the full list, only to ignore most of it.

    An obvious shortcut is to not recurse into directories that would be excluded by some exclude directive. However, you can only do that if the exclude directives come after all include directives.

    This is not a theoretical problem — I’ve found that doing sdist and other commands can take 10 minutes to run, due to a large number of files in the working directory if you use the tool tox for instance. This means that runs of tox itself (which uses become very slow. I am currently attempting to fix this issue, but it is looking like it will be really hard.

    Adding the optimised case might not look that hard (you can shortcut the file system traversal using any exclude directives that come after all include directives), but it adds sufficiently to the complexity that a patch is unlikely to be accepted — it increases the number of code paths and the chances of mistakes, to the point of it not being worth it.

    It might be that the only practical solution is to avoid altogether and optimise only the case when it is completely empty.

  2. The power has a second cost — files are harder to understand.

    First, in understanding how the language works — the docs for this are considerably longer than for the declarative version I imagined.

    Second, in analysing a specific file — you have to execute the commands in your head in order to work out what the result will be, rather than being able to take each line on its own, or in any order that makes sense to you.

    This actually results in packaging bugs. For instance, it would be easy to believe that a directive like:

    global-exclude *~

    at the top of a file would result in any file name ending in ~ (temporary files created by some editors) being excluded from the package. In reality it does nothing at all, and the files will be erroneously included if other commands include them.

    Examples I’ve found of this mistake (exclude directives that don’t function as intended or are useless) include:

    • hgview (exclude directives at the top do nothing)
    • django-mailer (global-exclude at the top does nothing)
  3. Another result is that you cannot groups lines in the file in any way you please, for clarity, since re-ordering changes the meaning of the file.

In addition, virtually no-one will actually use the additional power. I’m willing to bet that 99.99% files do not make use of the additional power of the imperative language (I downloaded 250 and haven’t found any that do). So we could have been served much better by a declarative language here instead of an imperative one. But backwards compatibility forces us to stick with this. That highlights another point — it is often possible to add features to a language to make it more powerful, but compatibility concerns usually don’t allow you to make it less powerful, for example by removing features or adding constraints.

URL reversing

One core piece of the Django web framework is URL routing. This is the component that parses URLs and dispatches them to the handler for that URL, possibly passing some components extracted from the URL.

In Django, this is done using regular expressions. For an app that displays information about kittens, you might have a kittens/ with the following:

from django.conf.urls import url from kittens import views urlpatterns = [ url(r'^kittens/$', views.list_kittens, name="kittens_list_kittens"), url(r'^kittens/(?P<id>\d+)/$', views.show_kitten, name="kittens_show_kitten"), ]

The corresponding file looks like:

def list_kittens(request): # ... def show_kitten(request, id=None): # ...

Regular expressions have a capture facility built in, which is used to capture parameters that are passed to the view functions. So, for example, if this app were running on, a URL like results in calling the Python code show_kitten(request, id="23").

Now, as well as being able to route URLs to specific functions, web apps almost always need to generate URLs. For example, the kitten list page will need to include links to the individual kitten page i.e. show_kitten. Obviously we would like to do this in a DRY way, re-using the URL routing configuration.

However, we would be using the URL routing configuration in the opposite direction. When doing URL routing, we are doing:

URL path -> (handler function, arguments)

In URL generation, we know the handler function and arguments we want the user to arrive at, and want to generate a URL that will take the user there, after going through the URL routing:

(handler function, arguments) -> URL path

In order to do this, we essentially have to predict the behaviour of the URL routing mechanism. We are asking “given a certain output, what is the input?”

In the very early days Django did not include this facility, but it was found that with most URLs, it was possible to 'reverse' the URL pattern. The regex can be parsed looking for the static elements and the capture elements.

Note, first of all, that this is only possible at all because the language being used to define URL routes is a limited one — regular expressions. We could easily have defined URL routes using a more powerful language. For example, we could have defined them using functions that:

  • take a URL path as input
  • raise NoMatch if they do not match
  • return a truncated URL and an optional set of captures if they do match.

Our kittens would look like something like this:

from django.conf.urls import url, NoMatch def match_kitten(path): KITTEN = 'kitten/' if path.startswith(KITTEN): return path[len(KITTEN):], {} raise NoMatch() def capture_id(path): part = path.split('/')[0] try: id = int(part) except ValueError: raise NoMatch() return path[len(part)+1:], {'id': id} urlpatterns = [ url([match_kitten], views.list_kittens, name='kittens_list_kittens'), url([match_kitten, capture_id], views.show_kitten, name="kittens_show_kitten"), ]

Of course, we could provide helpers that make things like match_kitten and capture_id much more concise:

from django.conf.urls import url, m, c urlpatterns = [ url([m('kitten/'), views.list_kittens, name='kittens_list_kittens'), url([m('kitten/'), c(int)], views.show_kitten, name="kittens_show_kitten"), ]

Now, this language for URL routing is actually a lot more powerful than our regex based one, assuming that m and c are returning functions as above. The interface for matching and capturing is not limited to the capabilities of regexes — for instance, we could do database lookups for the IDs, or many other things.

The downside, however, is that URL reversing would be entirely impossible. For general, Turing complete languages, you cannot ask “given this output, what is the input?”. We could potentially inspect the source code of the function and look for known patterns, but it quickly becomes totally impractical.

With regular expressions, however, the limited nature of the language gives us more options. In general, URL configuration based on regexes is not reversible — a regex as simple as . cannot be reversed uniquely. (Since we want to generate canonical URLs normally, a unique solution is important. As it happens, for this wild card, Django currently picks an arbitrary character, but other wild cards are not supported). But as long as wild cards of any sort are only found within capture groups (and possibly some other constraints), the regex can be reversed.

So, if we want to be able to reliably reverse the URL routes, we actually want a language less powerful than regular expressions. Regular expressions were presumably chosen because they were powerful enough, without realising that they were too powerful.

Additionally, in Python defining mini-languages for this kind of thing is quite hard, and requires a fair amount of boilerplate and verbosity both for implementation and usage — much more than when using a string based language like regexes. In languages like Haskell, relatively simple features like easy definitions of algebraic data types and pattern matching make these things much easier.

Regular expressions

The mention of regexes as used in Django’s URL routing reminds me of another problem:

Many usages of regexes are relatively simple, but whenever you invoke a regex, you get the full power whether you need it or not. One consequence is that for some regular expressions, the need to do backtracking to find all possible matches means that it is possible to construct malicious input that takes a huge amount of time to be processed by the regex implementation.

This has been the cause of a whole class of Denial Of Service vulnerabilities in many web sites and services, including one in Django due to an accidentally 'evil' regex in the URL validator — CVE-2015-5145.

Django templates vs Jinja templates

The Jinja template engine was inspired by the Django template language, but with some differences in philosophy and syntax.

One major advantage of Jinja2 over Django is that of performance. Jinja2 has an implementation strategy which is to compile to Python code, rather than run an interpreter written in Python, which is how Django works, and this results in a big performance increase — often 5 to 20 times. (YMMV etc.)

Armin Ronacher, the author of Jinja, attempted to use the same strategy to speed up Django template rendering. There were problems, however.

The first he knew about when he proposed the project — namely that the extension API in Django makes the approach taken in Jinja very difficult. Django allows custom template tags that have almost complete control over the compilation and rendering steps. This allows some powerful custom template tags like addtoblock in django-sekizai that seems impossible at first glance. However, if a slower fallback was provided for these less common situations, a fast implementation might still have been useful.

However, there is another key difference that affects a lot of templates, which is that the context object that is passed in (which holds the data needed by the template) is writable within the template rendering process in Django. Template tags are able to assign to the context, and in fact some built-in template tags like url do just that.

The result of this is a key part of the compilation to Python that happens in Jinja is impossible in Django.

Notice that in both of these, it is the power of Django’s template engine that is the problem — it allows code authors to do things that are not possible in Jinja2. However, the result is that a very large obstacle is placed in the way of attempts to compile to fast code.

This is not a theoretical consideration. At some point, performance of template rendering becomes an issue for many projects, and a number have been forced to switch to Jinja because of that. This is far from an optimal situation!

Often the issues that make optimisation difficult are only clear with the benefit of hindsight, and it isn’t true to say that simply adding restrictions to a language is necessarily going to make it easier to optimise. There are certainly languages which somehow manage to hit a “sour spot” of providing little to power to either the authors or the consumers!

You might also say that for the Django template designers, allowing the context object to be writable was the obvious choice because Python data structures are typically mutable by default. Which brings us to Python...


There are many ways that we could think about the power of the Python language, and how it makes life hard for every person and program that wants to make sense of Python code.

Compilation and performance of Python is an obvious one. The unrestricted effects that are possible at any point, including writable classes and modules etc., not only allow authors to do some very useful things, they make it extremely difficult to execute Python code quickly. PyPy has made some impressive progress, but looking at the curve from PyPy 1.3 onward, which shows diminishing returns, makes it clear that they are unlikely to make much bigger gains in the future. And the gains that have been made in terms of run time have often been at the expense of memory usage. There is simply a limit to how well you can optimise Python code.

(Please note, to all who continue reading this — I’m not a Python basher, or a Django basher for that matter. I’m a core developer of Django, and I use Python and Django in almost all my professional programming work. The point of this post is illustrate the problems caused by powerful languages).

However, rather than focus on the performance problems of Python, I’m going to talk about refactoring and maintenance. If you do any serious work in a language, you find yourself doing a lot of maintenance, and being able to do it quickly and correctly often becomes very important.

So, for example, in Python, and with typical VCS tools (Git or Mercurial, for instance), if you re-order functions in a module e.g. move a 10 line function to a different place, you get a 20 line diff, despite the fact that nothing changed in terms of the meaning of the program. And if something did change (the function was both moved and modified), it’s going to be very difficult to spot.

This happened to me recently, and set me off thinking just how ridiculously bad our toolsets are. Why on earth are we treating our highly structured code as a bunch of lines of text? I can’t believe that we are still programming like this, it is insane!

At first, you might think that this could be solved with a more intelligent diff tool. But the problem is that in Python, the order in which functions are defined can in fact change the meaning of a program (i.e. change what happens when you execute it).

Here are a few examples:

Using a previously defined function as a default argument:

def foo(): pass def bar(a, callback=foo): pass

These functions can’t be re-ordered or you’ll get a NameError for foo in the definition of bar.

Using a decorator:

@decorateit def foo(): pass @decorateit def bar(): pass

Due to unrestricted effects that are possible in @decorateit, you can’t safely re-order these functions and be sure the program will do the same thing afterwards. Similarly, calling some code in the function argument list:

def foo(x=Something()): pass def bar(x=Something()): pass

Similarly, class level attributes can’t be re-ordered safely:

class Foo(): a = Bar() b = Bar()

Due to unrestricted effects possible inside the Bar constructor, the definitions of a and b cannot be re-ordered safely. (This might seem theoretical, but Django, for instance, actually uses this ability inside Model and Form definitions to provide a default order for the fields, using a cunning class level counter inside the base Field constructor).

Ultimately, you have to accept that a sequence of function statements in Python is a sequence of actions in which objects (functions and default arguments) are created, possibly manipulated, etc. It is not a re-orderable set of function declarations as it might be in other languages.

This gives Python an amazing power when it comes to writing it, but imposes massive restrictions on what you can do in any automated way to manipulate Python source code.

Above I used the simple example of re-ordering two functions or class attributes. But every single type of refactoring that you might do in Python becomes virtually impossible to do safely because of the power of the language e.g. duck typing means you can’t do method renames, the possibility of reflection/dynamic attribute access (getattr and friends) means you can’t in fact do any kind of automated renames (safely).

So, if we are tempted to blame our crude VCS or refactoring tools, we actually have to blame the power of Python — despite the huge amount of structure in correct Python source code, there is very little that any software tool can do with it when it comes to manipulating it, and the line-based diffing that got me so mad is actually a reasonable approach.

Now, 99% of the time, we don’t write Python decorators which mean that the order of function definitions makes a difference, or silly things like that — we are responsible “adults”, as Guido put it, and this makes life easier for human consumers. But the fact remains that our tools are limited by what we do in the 0.01% of cases. For some consumers, we can optimise on the basis of the common case, and detect when that fails e.g. a JIT compiler using guards. But with others e.g. VCS or refactoring tools, the “runtime” information that you hit the unlucky case comes far too late — you might have released your subtly-broken code by the time you find out, so you have to be safe rather than sorry.

In an ideal world, with my dream language, when you rename a function, the entire “diff” in your VCS should simply be “Function foo renamed to bar”. (And, this should be exportable/importable, so that when you upgrade a dependency to a version in which foo is renamed to bar, it should be exactly zero work to deal with this). In a “less powerful” language, this would be possible, but the power given to the program author in Python has taken power from all the other tools in the environment.

Does this matter? It depends on how much time you spend manipulating your code, compared to using code to manipulate data.

At the beginning of a project, you may be tempted to desire the most powerful language possible, because it gives you the most help and freedom in terms of manipulating data. But later on, you spend a huge amount of time manipulating code, and often using an extremely basic tool to do so — a text editor. This treats your highly structured code as one of the least structured forms of data — a string of text — exactly the kind of manipulation you would avoid at all costs inside your code. But all the practices you would choose and rely on inside your program (manipulating all data inside appropriate containers) are no longer available to you when it comes to manipulating the program itself.

Some popular languages make automated refactoring easier, but more is needed: to actually make use of the structure of your code, you need an editor and VCS that understand your code properly. Projects like Lamdu are in the right direction, but still in their infancy, and unfortunately involving re-thinking the entire software development stack :-(


When you consider the total system and all the players (whether software or human), including the need to produce efficient code, and long term maintainability, less powerful languages are actually more powerful — “slavery is freedom”. There is a balance between expressiveness and reasonability.

The more powerful a language, the greater the burden on software tools, which either are need to be more complicated in order to work, or are forced to do less than they could. This includes:

  • compilers — with big implications for performance
  • automated refactoring and VCS tools — with big implications for maintenance.

Similarly, the burden also increases for humans — for anyone attempting to understand the code or modify it.

A natural instinct is to go for the most powerful solution, or a solution that is much more powerful than is actually needed. We should try to do the opposite — find the least powerful solution that will do the job.

This won’t happen if creating new languages (which might involve parsers etc.) is hard work. We should prefer software ecosystems that make it easy to create very small and weak languages.

Categories: Offsite Blogs

Something that is Semigroup but is not Monoid

Haskell on Reddit - Sat, 11/14/2015 - 2:45am

Could you tell me about real-world Semigroup instances that are not Monoid instances?

submitted by gtab62
[link] [27 comments]
Categories: Incoming News

Expanding the Core Libraries Committee

libraries list - Fri, 11/13/2015 - 11:18pm
At present the core libraries committee consists of 7 members: 6 regular members and one ex-officio appointment for the Haskell Platform maintainer. The size of this committee was originally selected to be in line with the size of the extant committee. During a meeting of the committee last week, we decided it is a bit too small to fulfill the function of adequately representing the broader community while still providing a sufficient institutional memory. Consequently, we've resolved to replace the current Haskell Platform maintainer appointment with 3 additional seats on the committee. This will grow the committee to 9 members. Pleased be advised, membership on the committee is a commitment to a fairly active role. For more information on some the processes surrounding the core libraries, please see the library submissions page on the Haskell wiki: If you are interested in self-nominating for the
Categories: Offsite Discussion

In haskell

Haskell on Reddit - Fri, 11/13/2015 - 9:54pm

The output of doubleEveryOther has a mix of one-digit and two-digit numbers. Define the function

sumDigits :: [Integer] -> Integer

is it right ?

submitted by Germany22221
[link] [4 comments]
Categories: Incoming News

The last decade: what has changed,was has stayed the same?

haskell-cafe - Fri, 11/13/2015 - 9:43pm
Hi all, I used to be interested in Haskell roughly a decade ago, then life happened. Now that I'm setting up project work, I'm interested in what has been happening in the meantime. I read a bit on the usual pages, did a quick scan of the mailing list, didn't find out what the important questions of today are and what answers exist. Some ideas of relevant questions below. Feel free to pose and answer your own questions :-) After learning the bare language, how long does it take a competent programmer to become confident in the performance of his Haskell code? After learning the bare language, how long does it take a competent programmer to know when and when not to use strictness annotations/operators? I'm seeing a lot of typesystem golf happening. Is this teachable to the average competent programmer? Is it relevant to everyday programming such as business logic, database access, or webpage generation? (If no, for what programming endeavours is it relevant?) What's the status of debugging - I
Categories: Offsite Discussion

In haskell

Haskell on Reddit - Fri, 11/13/2015 - 8:20pm

/Can anyone tell me how to do this ?

function numPassed that takes a list of grades and returns the numbe passing grades (score >= 80)

numPassed [50,98,68,65,75,82,94] -> 3

thanks yu

submitted by Germany22221
[link] [7 comments]
Categories: Incoming News

Ask /r/haskell: type vs. newtype as it applies to library design?

Haskell on Reddit - Fri, 11/13/2015 - 7:04pm

A friend and I are trying to get some practice on a medium-sized haskell project, and we've decided to try and build some bindings for PivotalTracker's API. It's one of the few web services we use that doesn't already have mature bindings, so if things go well we might be able to put it up on Hackage.

The main design decisions I'm grappling with at the moment are when to use type vs newtype. The rule of thumb I have at the moment is to use type when there's no serious cost to getting things confused, so writing type ApiToken = String seems like a reasonable thing to do, but newtype ApiToken = ApiToken String would be overkill. OTOH, I would probably define

newtype ProjectId = ProjectId { unProjectId :: Int } newtype StoryId = StoryId { unStoryId :: Int }

to prevent them from getting mixed up.

The other type/newtype question I have came up in the context of monad transformers. It seems like I'm going to have a lot of functions with types like doAThing :: ApiToken -> ... -> IO _. My instincts suggest that I should abstract out the ApiToken argument by defining a reader-like monad that passes it around, and do it mtl-style so it plays nice with other libraries. Then the library functions turn into things like doAThing :: MonadIO m => ... -> PivotalT m _, and I define functions runPivotal :: MonadIO m => ApiToken -> PivotalT m a -> m a and (maybe) execPivotal :: ApiToken -> PivotalT m a -> m (). Doing this with type looks straightforward:

type PivotalT = ReaderT ApiToken type Pivotal = PivotalT IO

This seems to bring a lot of benefits: the mtl machinery comes for free, so things like liftIO work as they should and I think even the various library functions I want to implement may even automatically lift. But it also means that other transformers in the stack become instances of MonadReader, and that might be surprising to library callers?

OTOH, if I set up PivotalT as a newtype and define a typeclass MonadPivotal, there's extra boilerplate but things seem safer. This seems to be the approach the hedis package uses. But then I've got dim memories of people criticising hedis' monadic interface, so I'm not sure if that's best practice?

In summary, I'm interested in learning more about the following things:

  1. What general heuristics people use when deciding to use type vs. newtype
  2. Specifically, whether type or newtype makes the most sense for setting up the PivotalT transformer mentioned above
  3. Whether there is a better pattern for libraries where some connection information has to get passed around API calls a lot? (It would also be cool if it could set up a reuseable TCP connection when it calls on wreq)
submitted by char2
[link] [8 comments]
Categories: Incoming News

Functional Jobs: Haskell Developer (ON SITE) at CJ Affiliate by Conversant (Full-time)

Planet Haskell - Fri, 11/13/2015 - 5:45pm

Join our newly formed team of crack commandos (there are only two of us at this point) tasked with replatforming a flaky data import / export system to Haskell (from Java). This is the first Haskell project at the company and we plan (pray?) to knock it out of the park.

The right candidate is either: A senior developer (JVM experience preferred) with a hobbyist's level of Haskell foo and a keen interest in Haskell's particular flavor of functional programming - or a junior/mid-career developer with strong Haskell skills.

Get information on how to apply for this position.

Categories: Offsite Blogs

Functional Jobs: Haskell Developer at CJ Affiliate by Conversant (Full-time)

Planet Haskell - Fri, 11/13/2015 - 5:45pm

Join our newly formed team of crack commandos (there are only two of us at this point) tasked with replatforming a flaky data import / export system to Haskell (from Java). This is the first Haskell project at the company and we plan (pray?) to knock it out of the park.

The right candidate is either: A senior developer (JVM experience preferred) with a hobbyist's level of Haskell foo and a keen interest in Haskell's particular flavor of functional programming - or a junior/mid-career developer with strong Haskell skills.

Get information on how to apply for this position.

Categories: Offsite Blogs

Do I Need To Write An Instance for Assertable Here?

haskell-cafe - Fri, 11/13/2015 - 5:16pm
Below is some test code I am writing for my game, the link to the entire codebase, the error message and some discussion main :: IO () main = defaultMain [ testGroup "EventNetwork Input" [testBuffer "bBuffer" Populated] ] testBuffer :: String -> BufferState -> Test testBuffer name Populated = testCase name $ assert $ bufferPopulated (UAC (PlayerCommand (Move (ToPlanetName Mongo)) (AID (Data.Text.pack "100")))) testBuffer name Empty = testCase name $ assert $ bufferEmptied (UAC (PlayerCommand (Move (ToPlanetName Mongo)) (AID (Data.Text.pack "100")))) bufferPopulated :: UAC -> MomentIO Bool bufferPopulated ev = do let eInput = ev <$ never eValidated = toVAC <$> eInput bBufferMap <- (buffer eValidated eClear) :: MomentIO (Behavior BufferMap) let r2 = [(Just $ BufferMap $ M.insert (AID (Data.Text.pack "100")) (toVAC ev) (M.empty :: M.Map AID VAC))] r1 <- liftIO $ ((interpret (eBuffer bBufferMap) []) :: IO [Maybe BufferMap]) return $ r1 == r2 bufferEmptied :: UAC -> MomentIO Bool b
Categories: Offsite Discussion

memory usage question

Haskell on Reddit - Fri, 11/13/2015 - 4:25pm


I was playing with ghci to get a hang of what haskell exactly is working. So in order to see the memory usage I did this:

Prelude> :set +s

After that I tried that:

Prelude> (last $ [1..1000000]++[1..10])::Integer 10 (0.04 secs, 137,329,608 bytes) Prelude> (last $ [1..10000000]++[1..10])::Integer 10 (0.35 secs, 1,361,981,112 bytes)

It seems like more values I have on the list more memory I need. Is this an expected behavior?

I was expecting this call to be in constant memory. Because I do not store the values in the list. After I use them they they can be garbage collected at the spot. So there is no reason for this this call to be use more than memory for only 1 Integer.

Also last seems to be implemented as tail recursive. So it shouldn't be using any extra stack for recursion.

submitted by yilmazhuseyin
[link] [7 comments]
Categories: Incoming News