The Modelling Infrastructure (MI) team at Standard Chartered has an open position for a typed functional programming developers, based in London. MI are a dev/ops-like team responsible for the continuous delivery, testing, tooling and general developer efficiency of the Haskell-based analytics package used by the bank. They work closely with other Haskell dev teams in the bank, providing developer tools, testing and automation on top of our git ecosystem.
The role involves improving the ecosystem for developers and further automation of our build, testing and release infrastructure. You will work with devs in London, as part of the global MI team (based in Singapore and China). Development is primarily in Haskell. Knowledge of the Shake build system and Bake continuous integration system would be helpful. Strong git skills would be an advantage. Having a keen eye for analytics, data analysis and data-driven approaches to optimizing tooling and workflows is desirable.
This is a permanent, associate director-equivalent positions in London
Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we definitely want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability. Demonstrated ability to write Haskell-based tooling around git systems would be a super useful.
The role requires physical presence in London, either in our Basinghall or Old Street sites. Remote work is not an option. No financial background is required.Contracting-based positions are also possible if desired.
If this sounds exciting to you, please send your resume to me – donald.stewart <at> sc.com
Call for papers: 21th International Conference on Engineering of Complex Computer Systems (ICECCS 2016), Dubai, United Arab Emirates, November 6-8 2016
tl;dr: This fall, I will be teaching an undergraduate PL course, with a focus on practical language design principles and tools. Feedback, questions, assignments you can share with me, etc. are all most welcome!
This fall, I will be teaching an undergraduate course on programming languages. It’s eminently sensible to ask a new hire to take on a course in their specialty, and one might think I would be thrilled. But in a way, I am dreading it.
It’s my own fault, really. In my hubris, I have decided that I don’t like the ways that PL courses are typically taught. So this summer I have to buckle down and actually design the course I do want to teach. It’s not that I’m dreading the course itself, but rather the amount of work it will take to create it!
I’m not a big fan of the sort of “survey of programming languages” course that gets taught a lot, where you spend three or four weeks on each of three or four different languages. I am not sure that students really learn much from the experience (though I would be happy to hear any reports to the contrary). At best it feels sort of like making students “eat their vegetables”—it’s not much fun but it will make them grow big and strong in some general sense.1 It’s unlikely that students will ever use the surveyed languages again. You might hope that students will think to use the surveyed languages later in their career because they were exposed to them in the course; but I doubt it, because three or four weeks is hardly enough to get any real sense for a language and where it might be useful. I think the only real argument for this sort of course is that it “exposes students to new ways of thinking”. While that is certainly true, and exposing students to new ways of thinking is important—essentially every class should be doing it, in one way or another—I think there are better ways to go about it.
In short, I want to design a course that will not only expose students to new ideas and ways of thinking, but will also give them some practical skills that they might actually use in their career. I started by considering the question: what does the field of programming languages uniquely have to offer to students that is both intellecually worthwhile (by my own standards) and valuable to them? Specifically, I want to consider students who go on to do something other than be an academic in PL: what do I want the next generation of software developers and academics in other fields to understand about programming languages?
A lightbulb finally turned on for me when I realized that while the average software developer will probably never use, say, Prolog, they almost certainly will develop a domain-specific language at some point—quite possibly without even realizing they are doing it! In fact, if we include embedded domain-specific languages, then in essence, anyone developing any API at all is creating a language. Even if you don’t want to extend the idea of “embedded domain-specific language” quite that far, the point is that the tools and ideas of language design are widely applicable. Giving students practice designing and implementing languages will make them better programmers.
So I want my course to focus on language design, encompassing both big ideas (type systems, semantics) as well as concrete tools (parsing, ASTs, type checking, interpreters). We will use a functional programming language (specifically, Haskell) for several reasons: to expose the students to a programming paradigm very different from the languages they already know (mainly Java and Python); because FP languages make a great platform for starting to talk about types; and because FP languages also make a great platform for building language-related tools like parsers, type checkers, etc. and for building embedded domain-specific languages. Notably, however, we will only use Haskell: though we will probably study other types of languages, we will ues Haskell as a medium for our study, e.g. by implementing simplified versions of them in Haskell. So while the students will be exposed to a number of ideas there is really only one concrete language they will be exposed to. The hope is that by working in a single language all semester, the students may actually end up with enough experience in the language that they really do go on to use it again later.
As an aside, an interesting challenge/opportunity comes from the fact that approximately half the students in the class will have already taken my functional programming class this past spring, and will therefore be familiar with Haskell. On the challenge side, how do I teach Haskell to the other half of the class without boring the half that already knows it? Part of the answer might lie in emphasis: I will be highlighting very different aspects of the language from those I covered in my FP course, though of course there will necessarily be overlap. On the opportunity side, however, I can also ask: how can I take advantage of the fact that half the class will already know Haskell? For example, can I design things in such a way that they help the other half of the class get up to speed more quickly?
In any case, here’s my current (very!) rough outline for the semester:
- Introduction to FP (Haskell) (3 weeks)
- Type systems & foundations (2-3 weeks)
- lambda calculus
- type systems
- Tools for language design and implementation (4 weeks)
- (lexing &) parsing, ASTs
- (very very basics of) compilers (this is not a compilers course!)
- Domain-specific languages (3 weeks)
- Social aspects? (1 week)
- language communities
- language adoption
My task for the rest of the summer is to develop a more concrete curriculum, and to design some projects. This will likely be a project-based course, where the majority of the points will be concentrated in a few big projects—partly because the nature of the course lends itself well to larger projects, and partly to keep me sane (I will be teaching two other courses at the same time, and having lots of small assignments constantly due is like death by a thousand cuts).
I would love feedback of any kind. Do you think this is a great idea, or a terrible one? Have you, or anyone you know of, ever run a similar course? Do you have any appropriate assignments you’d like to share with me?
Actually, I love vegetables, but anyway.↩
Reflecting back on the last 6 years of developing and teaching with CodeWorld, there are a number of decisions that were unique, and often even controversial, that define the project. For the record, here are eight of the biggest decisions I’ve made with CodeWorld, and the reasons for them.1. Teaching functional programming
Regular readers of this blog are probably familiar with functional programming, but for those who aren’t, you should understand that it’s really a rather different paradigm from most typical programming. It’s not just another syntax, with a few different features. Instead, it’s a whole new way of breaking down problems and expressing solutions. Basic ideas taught in the first few weeks of traditional computer programming courses – for example, loops – just don’t exist at all. And other really central ideas, like functions and variables, have a completely different meaning.
I’m not quite alone in teaching functional programming, though. Matthias Felleisen and Shriram Krishnamurthi started sizable effort to teach Scheme at the K12 level in the 1990s, and Emmanuel Schanzer created a Scheme/Racket based curriculum called Bootstrap, which is heavily based on functional programming. I’ve made the same choice, and for much the same reason.
In fact, I never set out to teach “coding” at all! My goal is to teach mathematics more effectively. But mathematics education suffers from the weakness that students who make a mistake often don’t find out about it until days later! By them time, whatever confusion of ideas led to the error has long been forgotten. CodeWorld began as my attempt to get students to directly manipulate things like functions, expressions, and variables, and get immediate feedback about whether the result makes sense, and whether it does what they intended. For that purpose, a functional programming language is perfect for the job!2. Teaching Haskell
Even after the switch to functional programming, I still surprise a lot of people by telling them I teach middle school students in Haskell! Let’s face it: Haskell has a bit of a reputation as a mind-bending and difficult language to learn, and it sometimes even deserves the reputation. This is, after all, the programming language community with more Ph.D. students per capita than any other, and where people hold regular conversations about applying the Yoneda lemma to help solve their coding challenges!
But it doesn’t have to be! Haskell also has some advantages over almost anything else, for someone looking to work with tangible algebra and mathematical notation.
First of all, the language semantics really are comparable to mathematics. Haskell is often called purely functional, meaning that it doesn’t just enable the use of functional programming ideas, but in fact embodies them! By contrast, most other widely used functional languages are impure. In an impure functional language, a function is actually the same complicated notion of a procedure or recipe that it is in an imperative language, but it is conventional (and the language offers powerful features to help with this) to stick to a subset that’s consistent with mathematics, most of the time. That’s often a fine trade-off in a software engineering world, where the additional complexity is sometimes needed; but in education, when I tell a student that a function is really just a set of ordered pairs, I don’t want to have to later qualify this statement with “… except for this magical function here, which produces a random number.”
Even more importantly, basic syntax looks almost exactly like mathematics (or at least, it can). Bootstrap, for example, gets the semantics right, but looking through sample student workbooks, there’s quite a bit of “here’s how you write this in math; now write it in Racket.” By contrast, when teaching with CodeWorld, we’ve been able to effectively explain the programming language as a set of conventions for typing math directly for the computer. There are obviously still some differences – both at the surface level, like using * for multiplication and ^ for exponents, and at a deeper level, like distinguishing between variables and constructors on the left-hand side of equations. But in practice, this has been easily understood by students as limitations and tweaks in which math notation CodeWorld understands. It feels like a dialect, not a new language.
(It’s worth pointing out that Racket also includes a purely functional language subset that’s used by Bootstrap, though the syntax is different. Shriram Krishnamurthi has mentioned Pyret, as well, which among other nice properties closes some of the ground between Scheme and mathematics notation, at least for expressions. You still can’t just write “f(x) = x + 5” to define a function, though.)
So what about the mind-bending parts of Haskell? It turns out most of them are optional! It took some effort, but as I’ll mention later, I have removed things like type classes (including the dreaded monads) and many unnecessary uses of higher-order functions. What’s left is a thin wrapper around notation that students are already learning in Algebra anyway.3. Using the Gloss programming model
Of course, a programming language by itself isn’t a complete tool. You also need libraries! The next big decision was to base CodeWorld on the programming model of Ben Lippmeier’s Gloss library.
Gloss is an interesting choice on its own. The programming model is very simple. Everything is a pretty comprehensible mathematical thing. It’s probably too simple for sizable projects, and you could make the case that teaching it is letting down students who want to be able to scale their programming skills up to larger projects. But again, it has two advantages that I believe outweigh this concern.
First, it’s tangible. Outside of Gloss, much of the current thinking around building interactive applications in functional programming environments centers around FRP (Functional Reative Programming). FRP defines a few abstract concepts (“events” and “behaviors”), and then hides when they look like or how they work. Of course, strong abstraction is a foundation of software engineering. But it’s not a foundation of learning, or of mathematics! Indeed, Elm also recently (and probably with even less justification, given its less educational audience) dropped FRP in favor of tangible functions, as well. The advantages of concrete and tangible types that students can get their heads around are hard to overstate.
Second, again, this choice better supports building an understanding of mathematical modeling. In addition to it being easier for a middle school student to understand a value of type Number -> Picture, than the more abstract Behavior Picture from FRP (or the even more obtuse non-terminating while-loop of the imperative world), it also gives them experience with understanding how real phenomena are modeled using simple ideas from mathematics. Later programs are built using initial values and step functions, along explicitly bundled state. This gently starts to introduce general patterns of thinking about change in ways that will come up again far down the road: in the study of linear algebra, calculus, differential equations, and dynamical systems!
Of course, there’s a cost here. I wouldn’t point someone to Gloss for a real-world project. Even something as simple as a single GUI component can be complicated and fragmented, since students have to separately connect the state, initial value, behavior over time, and event handling. But the cost in encapsulation is most keenly felt in larger projects by more experienced programmers who can find this sort of plumbing work tedious. Typical introductory programming students still have a lot to learn from connecting these pieces and understanding how to make them work together.4. Replacing the Prelude
Once I had Haskell and Gloss in place, the next big choice made by CodeWorld was to replace the Haskell prelude with a customized version. GHC, the most popular Haskell compiler, provides a lot of power to customize the language by making changes to libraries. This extends even to the meaning of literal text and numbers in the source code!
One reason for replacing the Prelude was to keep the complexity of a first working program as low as possible. For students who are just starting out, every word or piece of punctuation is an obstacle. Haskell has always done better on this front than Java, which requires defining a class, and a member function with a variety of options. But adding import statements definitely doesn’t fit the vision articulated above of the programming language as a thin wrapper around mathematical notation. So the modified Prelude puts all of the built-in CodeWorld functions in scope automatically, without the need to import additional modules. As a result, a minimal CodeWorld program is one line long.
A second reason for replacing the Prelude was to remove a lot of the programming jargon and historical accidents in Haskell. Some of this is so entrenched that experienced programmers don’t even notice it any more. For example, even the word “string” to denote a bit of text is a holdout from how computer programmers thought of their work in the mid 20th century. (CodeWorld calls the analogous type Text, instead, and also keeps it separate from lists.) Haskell itself has introduced its own jargon, which is confusing to students as well.
But the most important consequence of replacing the Prelude is that advanced language constructs, like type classes and monads, can be hidden. These features haven’t actually been removed from CodeWorld, but they are not used in the standard library, so that students who don’t intend to use them will not see them at all. This made more changes necessary, such as collapsing Haskell’s numeric type class hierarchy into a single type, called Number. Perhaps the most interesting adaptation was the implementation of the (==) operator for equality comparison, without a type class constraint. This was done by Luite, by inspecting the runtime representation of the values in the GHCJS runtime (see below).5. Intentionally foiling imperative thinking
Sometimes, it seems that the dogma of the functional programming language community (and Haskellers in particular) is that programmers are corrupted by imperative languages, and that a programmer learning a functional language for their first experience would have a much easier time. I haven’t found that to be 100% true. Perhaps it’s because even students with no prior programming experience have still been told, for example, to think of a program as a list of instructions. Or perhaps it’s something more intrinsic in the human brain. I don’t know for sure.
But what I do know for sure is that even with no previous experience, middle school students will gravitate toward imperative semantics unless they are carefully held back! Because of this, another choice made by CodeWorld, and one of the main differences from Gloss, is that it makes some changes to intentionally trip up students who try to think of their CodeWorld expressions as an imperative sequence of instructions.
One example of such a change: in Gloss, a list of pictures is overlaid from back to front. In CodeWorld, though, the order is reversed. Combining pictures, whether via the pictures function, or the & operator, is done from front to back. The reason is that as I observed students in my classes, I realized that many of them had devised a subtly wrong understanding of the language semantics: namely, that circle(1) was not a circle, but instead a command to draw a circle, and that the & operator simply meant to do one thing, and then the next, and the pictures ended up overlaying each other because of the painter’s algorithm. Because of this misunderstanding, they struggled to apply or understand other operations, like translation or rotation, in a natural way. After swapping the order of parameters, students who form such a hypothesis will immediately have it proven wrong. (The analogous mistake now would be to assume that & means to do the second thing first, and no student I’m aware of has made that error.)
A similar situation exists with colors. In Gloss, the color function changes the color only of parts of a picture that don’t already have a color! This means that the semantic model of the Picture type in Gloss is quite complex indeed. Instead of just being a visual shape, a Gloss Picture is a shape where some parts have fixed color, but others have unspecified color, and the color function operates on that value by fixing any unspecified bits to the given color. Indeed, the most sensible way to understand these values is in terms of the implementation: that the color function sets a current color in the graphics context, which is used for that subtree, but only if it’s not changed first. This is a leaky implementation! It is fixed by CodeWorld, where applying a color to a picture overrides any existing coloring.
Another change that helped a lot with this was to carefully remove the use of verbs for function names in the CodeWorld standard library. I observed verbs misleading students many times. Sometimes, they expected that use of a function would permanently change the value of its parameter. Other times, they even expected a function like rotate to turn a picture into an animation that keeps moving! The key idea they are missing is that functions are not actions, but rather just relations between values. Such relations are better (even if it’s sometimes awkward) described somewhere on a scale between nouns and adjectives, rather than verbs. The way the code reads after this change once again acts as a roadblock to students who try to build on an incorrect understanding.6. Embracing the web
Beyond the programming language and libraries, another important choice in CodeWorld was to strongly adopt the web as a medium. The first version of the platform in 2010 was a relatively early adopter of web-based programming tools! However, the execution model (using SafeHaskell to run student code in a trusted way on the server and stream frames to the client) was definitely doomed from the start. It was a hack, which worked for one class, but was hardly scalable.
This decision was important for a few reasons. The first is compatibility and universal access. Schools have whatever devices they have access to: Chromebooks, bring-your-own-device plans, etc. Students themselves are constantly switching devices, or leaving theirs at home. Depending on a locally installed application – or saving student projects on a local disk – for a class at the middle school level would be a disaster. Because CodeWorld is all web-based, they can work from any system they wish, and have full access to all of their saved projects.
The second reason a web-based environment was important is that sharing is a huge part of student motivation. Because the CodeWorld server remembers all compiled code by its MD5 hash, students can send projects to each other simply by copying and pasting an appropriate URL into an email, chat message, or text message. It is difficult to express how helpful this has been.
Despite the advantages of the web, though, I am hoping to soon have export of student projects to mobile applications, as well. The development environment will remain web-based, but created applications can be installed as apps. It’s likely that someone will be working on this feature over the summer.7. Supporting mathematics education
Another big decision made by CodeWorld, and hinted at already, was to often sacrifice traditional computer programming education for better mathematics. This has been done with a hodge-podge of small changes, such as:
- De-emphasizing programming concepts like abstraction, maps and folds, and higher-order functions, in favor of approaches like list comprehensions that look more like mathematics.
- Uncurrying all functions in the standard library. This is easily the most controversial decision I’ve made for the Haskell community, but it’s really just a special case of de-emphasizing higher order functions. After uncurrying, functions can always be written in standard mathematical notation, such as f(x) or f(x, y).
- The coordinate plane uses a mathematical orientation. Gloss’s coordinate plane looks like computer screen coordinates, with (0, 0) in the top left. CodeWorld’s plane puts (0, 0) at the center, and it orients the positive y axis to point up. These just match conventions.
- CodeWorld also rescales the coordinates so that the plane extends from -10 to +10 in both dimensions, rather than counting in pixels. This turns out to have been an amazing choice! It simultaneously allows students to do low-precision placement of shapes on the plane without multi-digit artithmetic, and introduces decimals for added precision. In the end, this combination better supports middle school mathematics than the alternative.
Another change here was originally an accident. CodeWorld, from the beginning, did not implement using any kind of image file in a program. Originally, this was because I hadn’t bothered to implement a UI for uploading assets to use in the web-based programs! But after teaching with it, I don’t regret it at all. I’ve had other teachers tell me the same thing. By giving students only geometric primitives, not images copied and pasted from the web, as the tools for their projects, they are more creative and work with a lot more mathematics in the process.8. Opting for student-led projects
The final big decision on my list doesn’t pertain to the web site or tools at all, but is about the organization of classes. There are a lot of efforts out there to encourage students to learn to code. Hour of Code encourages teachers to devote an hour to programming activities and games. Many organizations are running day-long activities in Processing or Scratch or Greenfoot. Bootstrap started with once-a-week after school programs using Racket, and has scaled up from there. I’ve volunteered as a mentor and team lead for weekend hackathons by organizations like Black Girls Code.
These are great! I wouldn’t discourage anyone from jumping in and doing what they can. But in many cases, they seem to miss the opportunity for student creativity. There’s a tendency for a lot of organizations to create very guided activities, or shy away from anything that might get a student away off the beaten path. Early versions of the Bootstrap curriculum, for example, encouraged kids to build games, but designed a game from start to finish (in terms of generic words like the “player”, “target”, and “danger”), and give students limited creative choices in the process. (Bootstrap has since expanded into a more open-ended Bootstrap 2 curriculum, as well.) Hour of Code consists almost entirely of scripted activities that feel more like playing a game than building one, which makes sense because they are intended to be completed in an hour. The BGC hackathon mentioned above was limited to use of a drag-and-drop GUI design tools, and devoted more time to having students sit in presentations about startup business models and UX design than letting them create something impressive of their own.
So one way that CodeWorld has been different from many of these activities is that I’ve tried to plan from the very beginning of the course for students to decide on, design, and implement their own ideas from the ground up. Sometimes that means taking longer, and taking smaller steps. From the very beginning, projects in the class aren’t plugging bits into a designed program, but rather creating things of their own choosing, at the level students are capable of doing creatively from scratch at that point. It means that I don’t even start talking about games until halfway through the class. But I think it’s important to let students dig in at each step and express themselves by creating something that’s deeply and uniquely theirs. Along the way, they spend a lot more time tinkering and trying out things; even trying out different possible overall organizations of their programs!
I think CodeWorld has been very successful at this. When students in CodeWorld create their own games, they really create their own games. They work differently, and have different designs.
Here are a few examples from various classes, all written by students between 12 and 14 years old:
- Gnome Maze Use WASD keys to help a gnome navigate the maze and find the gold.
- Donkey Pong One player uses W and S, the other uses the up and down cursor keys. Hit the ball back and forth.
- Dot Grab One player uses WASD, and the other uses the arrow keys. Race to eat the most dots.
- Yo Grandma! Save an old lady in a wheelchair from various hazards by dragging attachments onto her wheelchair.
- Jacob the Fish Help Jacob dodge sushi and eat minnows, and avoid becoming a snack for an even larger fish
- Knight-Wizard-Archer A twist on rock/paper/scissors, with fantasy characters
- Popcorn Cat Drop the cat to eat the popcorn, but dodge dogs
IFL 2016 - Call for papers
28th SYMPOSIUM ON IMPLEMENTATION AND APPLICATION OF FUNCTIONAL LANGUAGES - IFL 2016
KU Leuven, Belgium
In cooperation with ACM SIGPLAN
August 31 - September 2, 2016
The goal of the IFL symposia is to bring together researchers actively engagedin the implementation and application of functional and function-basedprogramming languages. IFL 2016 will be a venue for researchers to present anddiscuss new ideas and concepts, work in progress, and publication-ripe resultsrelated to the implementation and application of functional languages andfunction-based programming.
Following the IFL tradition, IFL 2016 will use a post-symposium review processto produce the formal proceedings. All participants of IFL 2016 are invited tosubmit either a draft paper or an extended abstract describing work to bepresented at the symposium. At no time may work submitted to IFL besimultaneously submitted to other venues; submissions must adhere to ACMSIGPLAN's republication policy:
The submissions will be screened by the program committee chair to make surethey are within the scope of IFL, and will appear in the draft proceedingsdistributed at the symposium. Submissions appearing in the draft proceedingsare not peer-reviewed publications. Hence, publications that appear only in thedraft proceedings are not subject to the ACM SIGPLAN republication policy.After the symposium, authors will be given the opportunity to incorporate thefeedback from discussions at the symposium and will be invited to submit arevised full article for the formal review process. From the revisedsubmissions, the program committee will select papers for the formalproceedings considering their correctness, novelty, originality, relevance,significance, and clarity. The formal proceedings will appear in theInternational Conference Proceedings Series of the ACM Digital Library.
August 1: Submission deadline draft papersAugust 3: Notification of acceptance for presentationAugust 5: Early registration deadlineAugust 12: Late registration deadlineAugust 22: Submission deadline for pre-symposium proceedingsAugust 31 - September 2: IFL SymposiumDecember 1: Submission deadline for post-symposium proceedingsJanuary 31, 2017: Notification of acceptance for post-symposium proceedingsMarch 15, 2017: Camera-ready version for post-symposium proceedings
Prospective authors are encouraged to submit papers or extended abstracts to bepublished in the draft proceedings and to present them at the symposium. Allcontributions must be written in English. Papers must use the new ACM twocolumns conference format, which can be found at:
For the pre-symposium proceedings we adopt a 'weak' page limit of 12 pages. Forthe post-symposium proceedings the page limit of 12 pages is firm.
Authors submit through EasyChair:
IFL welcomes submissions describing practical and theoretical work as well assubmissions describing applications and tools in the context of functionalprogramming. If you are not sure whether your work is appropriate for IFL 2016,please contact the PC chair at email@example.com. Topics of interest include,but are not limited to:
- language concepts- type systems, type checking, type inferencing- compilation techniques- staged compilation- run-time function specialization- run-time code generation- partial evaluation- (abstract) interpretation- metaprogramming- generic programming- automatic program generation- array processing- concurrent/parallel programming- concurrent/parallel program execution- embedded systems- web applications- (embedded) domain specific languages- security- novel memory management techniques- run-time profiling performance measurements- debugging and tracing- virtual/abstract machine architectures- validation, verification of functional programs- tools and programming techniques- (industrial) applications
Peter Landin Prize
The Peter Landin Prize is awarded to the best paper presented at the symposiumevery year. The honored article is selected by the program committee based onthe submissions received for the formal review process. The prize carries acash award equivalent to 150 Euros.
Chair: Tom Schrijvers, KU Leuven, Belgium
- Sandrine Blazy, University of Rennes 1, France - Laura Castro, University of A Coruña, Spain- Jacques, Garrigue, Nagoya University, Japan- Clemens Grelck, University of Amsterdam, The Netherlands- Zoltan Horvath, Eotvos Lorand University, Hungary- Jan Martin Jansen, Netherlands Defence Academy, The Netherlands- Mauro Jaskelioff, CIFASIS/Universidad Nacional de Rosario, Argentina- Patricia Johann, Appalachian State University, USA- Wolfram Kahl, McMaster University, Canada - Pieter Koopman, Radboud University Nijmegen, The Netherlands- Shin-Cheng Mu, Academia Sinica, Taiwan- Henrik Nilsson, University of Nottingham, UK- Nikolaos Papaspyrou, National Technical University of Athens, Greece- Atze van der Ploeg, Chalmers University of Technology, Sweden- Matija Pretnar, University of Ljubljana, Slovenia- Tillmann Rendel, University of Tübingen, Germany- Christophe Scholliers, Universiteit Gent, Belgium- Sven-Bodo Scholz, Heriot-Watt University, UK- Melinda Toth, Eotvos Lorand University, Hungary- Meng Wang, University of Kent, UK- Jeremy Yallop, University of Cambridge, UK
The 28th IFL will be held in association with the Faculty of Computer Science,KU Leuven, Belgium. Leuven is centrally located in Belgium and can be easilyreached from Brussels Airport by train (~15 minutes). The venue in theArenberg Castle park can be reached by foot, bus or taxi from the city center.See the website for more information on the venue.
There are lies, damn lies and benchmarks.
— Old Jungle saying
testbench is a new library designed to make it easier to write comparison benchmarks, ensure that they return the correct value and thus help prevent unintentional bias in benchmarks.Motivation
About a year ago, I was working on some Haskell code that I wanted to compare to existing implementations. In Haskell, we of course have the wonderful criterion library to write benchmarks with, but whilst I’ve found it really helpful before to help me tell whether a particular function has been improving in performance as I work on it, I felt that it was a bit clunky for directly comparing implementations against each other (there used to be a [bcompare] function, but it hasn’t existed since version 184.108.40.206 which came out in August 2014).
When I tried looking at how others have approached this problem, I found that they did so by just directly using the bench and bgroup functions. From my point of view, there are two problems with this approach:
- There is a lot of duplication required with this: you would typically have something along the lines of:
[ bench "f1" $ nf f1 a
, bench "f2" $ nf f2 a
Because of this duplication, it is too easy to have benchmarks nominally comparing two (or more) functions/values, but accidentally end up comparing apples to oranges (e.g. using whnf instead of nf).
- The output generated by criterion – especially as of version 220.127.116.11 – is rather verbose and tends not to lend itself well to directly comparing results to multiple benchmarks. I personally find myself starting to get swamped looking at the terminal output if there’s more than a few benchmarks, and the HTML report is even worse.As I said above, it’s great when I’m directly looking at just how one function compares as I tweak it, but not when I’m wanting to compare multiple functions.
Whilst I kept looking at existing comparison benchmarks, I even came across an example where a comparison ended up nominally showing that f1 was faster than f2… except that the result of f1 was a value with an O(1) implementation of [rnf], whereas f2 has an O(n) definition. I don’t know if this is intentional (I think it probably wasn’t) and even if this is rectified f1 was still faster… but the difference in runtimes – whilst minor in comparison to performance between the two functions – is non-negligible.
This to me demonstrated the desirability of not only having a wrapper around criterion to reduce the verbosity of comparison benchmarks, but to only be able to produce unit tests to ensure criteria are satisfied.
It’s taken me longer than I wished to produce a syntax that I was both happy with and would actually work (with lots of fighting against GHC in the form of “Why won’t you accept this? Oh, wait, now I get it; that makes sense… but can’t you accept it anyway? Pretty please?”), but I’ve now finally gotten it to a usable form and am hence releasing it.
As extremely simple and contrived examples, consider the following:main :: IO () main = testBench $ do -- Monomorphic comparisons compareFunc "List length" (\n -> length (replicate n ()) == n) (testWith (@? "Not as long as specified") <> benchNormalForm) (mapM_ (\n -> comp ("len == " ++ show n) n) [1..5]) -- Polymorphic comparisons. -- -- Currently it isn't possible to use a Proxy as the argument to the -- function, so we're using 'undefined' to specify the type. compareFuncConstraint (Proxy :: Proxy (CUnion Eq Num)) "Number type equality" (join (==) . (0`asTypeOf`)) (baseline "Integer" (undefined :: Integer) <> benchNormalForm) $ do comp "Int" (undefined :: Int) comp "Rational" (undefined :: Rational) comp "Float" (undefined :: Float) comp "Double" (undefined :: Double)
When this is run, the result on the console is:Cases: 9 Tried: 9 Errors: 0 Failures: 0 Mean MeanLB MeanUB Stddev StddevLB StddevUB OutlierVariance List length len == 1 22.15 ns 21.86 ns 22.88 ns 1.505 ns 742.2 ps 2.826 ns 83% len == 2 22.64 ns 22.49 ns 22.87 ns 602.0 ps 449.5 ps 825.7 ps 43% len == 3 23.39 ns 23.16 ns 23.78 ns 1.057 ns 632.6 ps 1.553 ns 68% len == 4 23.70 ns 23.51 ns 23.95 ns 773.3 ps 567.9 ps 1.050 ns 53% len == 5 24.14 ns 23.96 ns 24.71 ns 962.4 ps 307.5 ps 1.886 ns 63% Number type equality Integer 12.59 ns 12.48 ns 12.80 ns 538.0 ps 312.4 ps 944.2 ps 67% Int 12.79 ns 12.69 ns 12.98 ns 463.6 ps 320.0 ps 665.2 ps 59% Rational 12.77 ns 12.67 ns 12.93 ns 395.1 ps 290.0 ps 535.9 ps 51% Float 13.13 ns 12.88 ns 13.42 ns 869.7 ps 667.3 ps 1.212 ns 83% Double 12.74 ns 12.57 ns 13.02 ns 704.6 ps 456.5 ps 1.047 ns 78%
You can see on the top line we’ve had nine tests (run using HUnit):
- From the first group we’ve specified that all five values must return True.
- From the second group, we’ve specified that all inputs must return the same value as for the Integer case.
Since all the tests passed, the benchmarks are run. The output for these is a tabular format to make it easier to do vertical comparisons (though in this case the variances are all high so we should take them with a grain of salt).Caveats
Whilst I’m quite pleased with the API for defining the actual tests/benchmarks (subject to what GHC will let me write), there’s still scope for more functionality (e.g. support for IO-based benchmarks).
However, the default output (as soon above) isn’t configurable. It’s possible to get the individual tests and benchmarks out to feed them explicitly to HUnit and criterion respectively, but if you’re after this particular output then you have to wait until all the benchmarks are complete before the results are printed. There is no support for saving results to file (either as a CSV of all the results or an HTML report), or even to control how the benchmarks are run (minimum time spent on each benchmark, etc.) or any other option currently offered by criterion.
If there is enough interest I can look at adding these in; but this satisfies my itch for now whilst getting this library out there for people to start trying out.
Filed under: Haskell
I wrote a small program named bench that lets you benchmark other programs from the command line. Think of this as a much nicer alternative to the time command.
The best way to illustrate how this works is to show a few example uses of the program:$ bench 'ls /usr/bin | wc -l' # You can benchmark shell pipelines
benchmarking ls /usr/bin | wc -l
time 6.756 ms (6.409 ms .. 7.059 ms)
0.988 R² (0.980 R² .. 0.995 R²)
mean 7.590 ms (7.173 ms .. 8.526 ms)
std dev 1.685 ms (859.0 μs .. 2.582 ms)
variance introduced by outliers: 88% (severely inflated)
$ bench 'sleep 1' # You have to quote multiple tokens
benchmarking sleep 1
time 1.003 s (1.003 s .. 1.003 s)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.003 s (1.003 s .. 1.003 s)
std dev 65.86 μs (0.0 s .. 68.91 μs)
variance introduced by outliers: 19% (moderately inflated)
$ bench true # The benchmark overhead is below 1 ms
time 383.9 μs (368.6 μs .. 403.4 μs)
0.982 R² (0.971 R² .. 0.991 R²)
mean 401.1 μs (386.9 μs .. 418.9 μs)
std dev 54.39 μs (41.70 μs .. 67.62 μs)
variance introduced by outliers: 87% (severely inflated)
This utility just provides a command-line API for Haskell's criterion benchmarking library. The bench tool wraps any shell command you provide in a a subprocess and benchmarks that subprocess through repeated runs using criterion. The number of runs varies between 10 to 10000 times depending on how expensive the command is.
This tool also threads through the same command-line options that criterion accepts for benchmark suites. You can see the full set of options using the --help flag:$ bench --help
Command-line tool to benchmark other programs
Usage: bench COMMAND ([-I|--ci CI] [-G|--no-gc] [-L|--time-limit SECS]
[--resamples COUNT] [--regress RESP:PRED..] [--raw FILE]
[-o|--output FILE] [--csv FILE] [--junit FILE]
[-v|--verbosity LEVEL] [-t|--template FILE] [-m|--match MATCH]
[NAME...] | [-n|--iters ITERS] [-m|--match MATCH] [NAME...] |
[-l|--list] | [--version])
-h,--help Show this help text
COMMAND The command line to benchmark
-I,--ci CI Confidence interval
-G,--no-gc Do not collect garbage between iterations
-L,--time-limit SECS Time limit to run a benchmark
--resamples COUNT Number of bootstrap resamples to perform
--regress RESP:PRED.. Regressions to perform
--raw FILE File to write raw data to
-o,--output FILE File to write report to
--csv FILE File to write CSV summary to
--junit FILE File to write JUnit summary to
-v,--verbosity LEVEL Verbosity level
-t,--template FILE Template to use for report
-m,--match MATCH How to match benchmark names ("prefix" or "glob")
-n,--iters ITERS Run benchmarks, don't analyse
-m,--match MATCH How to match benchmark names ("prefix" or "glob")
-l,--list List benchmarks
--version Show version info
The --output option is really useful: it outputs an HTML page with a chart showing the distribution of run times. For example, the following command:$ bench 'ls /usr/bin | wc -l' --output example.html
benchmarking ls /usr/bin | wc -l
time 6.716 ms (6.645 ms .. 6.807 ms)
0.999 R² (0.999 R² .. 0.999 R²)
mean 7.005 ms (6.897 ms .. 7.251 ms)
std dev 462.0 μs (199.3 μs .. 809.2 μs)
variance introduced by outliers: 37% (moderately inflated)
Also produces something like the following chart which you can view in example.html:
You can also increase the time limit using the --time-limit option, which will in turn increase the number of runs for better statistics. For example, criterion warned me that I had too many outliers for my benchmarks, so I can increase the time limit for the above benchmark to 30 seconds:$ bench 'ls /usr/bin | wc -l' --time-limit 30 --output example.html
benchmarking ls /usr/bin | wc -l
time 6.937 ms (6.898 ms .. 7.002 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 6.905 ms (6.878 ms .. 6.935 ms)
std dev 114.9 μs (86.59 μs .. 156.1 μs)
... which dials up the number of runs to the ~4000 range, reduces the number of outliers, and brings down the standard deviation by a factor of four:
Keep in mind that there are a few limitations to this tool:
- this tool cannot accurately benchmark code that requires a warm up phase (such as JVM programs that depend on JIT compilation for performance)
- this tool cannot measure performance below about half a millisecond due to the overhead of launching a subprocess and bash interpreter
Despite those limitations, I find that this tool comes in handy in a few scenarios:
- Preliminary benchmarking in the prototyping phase of program development
- Benchmarking program pipelines written in multiple languages
You can install this tool by following the instructions on the Github repo:
Or if you have the Haskell stack tool installed you can just run:$ stack update
$ stack install bench