News aggregator

Toby Goodwin: Debian chroot on Android

Planet Haskell - Fri, 11/04/2016 - 2:47am

Sometimes, a simple idea — so simple it can be distilled down to 4 words — can be truly astounding.


For quite a while, I've been considering the best way to ensure the resilience, security, and accessibility of various pieces of personal data. There are several different categories, and no solution will be optimal for all of them. My music collection, for example, is large, non-secret, and largely replaceable (although the thought of dragging that enormous box of CDs out of the garage and reripping them all is pretty daunting!) The music lives on a server in my home, with my own backups. I upload medium bitrate versions to a cloud music service, and I have a low bitrate copy on my laptop and phone. So that's pretty well covered.

A similar scheme covers my photos and videos. They are much less replaceable than music, but fortunately much smaller, so there are a few extra copies kicking about.

Then, I have a few tiny things that I want to keep in sync across various devices. For example, today's todo list, my "blue skies ideas" list, and my password store. I've looked at syncthing, which is an awesome project, and I'm sure I'm going to find a good use for it someday.

But for these things, git is really the obvious solution. Most of them are already git repos, including my password-store, the only missing piece is a git client on my phone. So I was searching for recommendations for Android git clients, and these words jumped out at me:

create a debian image, mount it in your android device and chroot to it

My flabber was well and truly gasted.


It's very straightforward. From some debian instance on which you have root, run:

debootstrap --foreign --arch=armhf jessie jessie

Tar up the resulting tree in jessie, copy it to android, unpack it (ah, but where?), chroot, and then run:

debootstrap --second-stage Which?

Here are some things I've used: ssh, rsync, dash, bash, the rc shell (which I happen to maintain). All the usual userland tools, mv, chmod, etc. These (of course) are the proper full GNU versions, so you don't keep being bitten by the little discrepancies in, for instance, the busybox versions.

Package management with apt-get and dpkg. And perl, git, nano, vim, update-alternatives (so I never have to see nano again), less, man.

I started installing the pass package, but that pulls in gtk, pango and a whole bunch of other things I'm not going to use. So I downloaded password-store and installed it myself.

The ps command (you need to mount /proc in the chroot of course), top, strace, lsof. You can even strace android processes, it all Just Works. (OK, lsof gets upset because it's inside a chroot and it can't see all the mount points that /proc/mounts says exist. But still.)

I thought it might be fun to run mosh. It installed fine, but then bombed out with a weird error. I went on a bit of a wild goose chase, and concluded (it was late at night) that I needed a fix from the development version. So I cloned the mosh repo on github, installed a whole heap of build tools, compilers, libraries, and built mosh. On my phone!

In fact, the problem was simpler than that, and easily solved by using the stty command to declare my terminal size. And then I had to open up some ports in android's firewall... with iptables of course.

I could go on, but you're getting the idea. In summary, this is not something pretending to be a GNU/Linux system. It's the real deal.

Of course, there are some missing pieces, of which the most serious is the lack of daemons. I've installed etckeeper, but there will be no daily autocommit.

Ping doesn't work, because even a root shell is not allowed to use raw sockets. You can create a user, but it's not able to do much... I'll look at this some more when I have time, but I'm just running everything as root at the moment. Android's systems for users, permissions, and capabilities are entirely unfamiliar to me, although I'm trying to learn.


I made and remade my chroot several times before I was happy with it. Hopefully these notes will make things quicker for you.

First of all, Debian wants a “real” filesystem, which is to say, anything except FAT. Of the existing partitions, an obvious choice would be /data, which on my phone is ext4. Unfortunately, the major major drawback of my phone is that its /data is tiddly, just 1GB, and perennially full. (I did try the chroot on /data, before realising the fatal flaw. One curiosity is that /data is mounted with nodev, so populating /dev fails till you remount without nodev. You might think it would be better to bind mount the real /dev into the chroot anyway, and you might well be right. But I've been running with the /dev made by debootstrap with no problems.)

So it's time to repartition my 32GB SD card. Android apparently doesn't support GPT which is only a minor nuisance. I do hate all that primary / extended / logical nonsense though, it's so 1980s.

Much, much more seriously, it complains bitterly if it finds an SD card without a FAT partition. This is infuriating. The kernel supports ext3 just fine (and ext4 too, at least for partitions on fixed internal storage, although apparently not for the SD card, which makes no sense to me). So, if I insert a card that happens to have an ext3 partition on it, why not just mount it? Or if there's some scenario I'm not aware of that might not work quite right, notify a dialogue that explains and offers to mount the partition anyway. What actually happens is a notification that tells you the SD card is “damaged”, and offers to format it. Arrggh!

(I have reason to believe that the latest versions of Android will countenance SD cards with real file systems, although I need to research this further.)

My next try was a 50MB FAT partition, and the remainder ext3. This largely worked, but it didn't leave anywhere for android to move those apps which are prepared to live on SD card, absolutely vital to squeeze some extra apps onto my old phone.

The final iteration was a 4GB FAT partition, and the rest ext3. Of course, I don't need 28GB for the chroot itself: it starts off well under 1G, and even after installing loads of stuff I'm still under 2G. But I realised that I'd be able to put my music collection on the ext3 partition, which would save the tedious step of renaming everything to comply with FAT restrictions (mainly the prohibition on : in file names). Of course, I can now rsync-over-ssh the music from my laptop, which seems to go quicker than via USB.

Another annoyance is that the ext3 partition on the SD card isn't automatically mounted. I've spent some time in the past trying to find a boot time hook I can use, but with no luck. So I have to do this from the android shell every time my phone reboots, using a helper script cunningly located under the mount point:

root@android:/ # cat /data/ext3/m #!/system/bin/sh mount -t ext3 /dev/block/mmcblk1p5 /data/ext3 What?

Far and away the nicest way to communicate with the chroot is to plug into a laptop or desktop and use adb shell from the Android SDK. At that point, it's scarcely different from sshing to a remote server.

Of course, the whole point of the phone is that it's portable. On the move, I'm using Jack Palevich's Terminal Emulator for Android and Klaus Weidner's Hacker's Keyboard. The keyboard has all the keys you need — Esc, Tab, Ctrl etc — so it's ideal for non-trivial tasks (such as vim!). But the tiny keys are very fiddly on my phone, especially in portrait, so I sometimes stick to my usual keyboard.

I've got a trivial helper script to start my favourite shell under ssh-agent:

root@android:/ # cat /data/ext3/ch #!/system/bin/sh exec chroot /data/ext3/jessie /usr/bin/ssh-agent /usr/bin/rc -l Whither?

So I have a fantastic solution to my document and password management problems. And a great ssh client. And a whole new architecture to build my projects on, most of which aren't the sort of thing that it makes much sense to run on a phone, but building in different places is always good for portability.

I'd heard that Android uses a "modified Linux" kernel, so I wasn't really expecting any of this stuff to work properly, let alone tools like strace and lsof. Apparently, though, the changes were folded back into the mainline kernel at the 3.3 release. My (3 year old) phone runs 3.4.5, so presumably this a fairly vanilla kernel.

This is awesome. Google has its faults, but their commitment to free software easily earns them the “least evil” prize among the current Internet quintumvirate. (That's Google, Apple, Facebook, Amazon, and Microsoft, for anyone who's been asleep the last few years.)

Realising that, yes, that computer in my pocket is a pukka Linux box has endeared me even further to Android. I'd love to write some apps for it... except I've already got more than enough projects to fill my “copious spare time”!

Update November 2016

A couple of new things I've discovered since writing this article.

First, the debootstrap command is available on Fedora, from the system repository! So you don't need a Debian box to build the initial system image. (Having debootstrap around is also handy for making a Debian container on my desktop.)

Secondly, I think the reason Android won't automatically mount an ext2/3/4 partition is that it has no idea how to map UIDs and GIDs on the filesystem. Any obvious trivial solution, such as “only UID 0 can read or write”, would make it inaccessible to Android processes. Remember, you're not supposed to root your Android device! I've just ordered my next phone, which is supported by CyanogenMod, so it's likely that I'll end up with an OS that is rather more enlightened about users having full control of their devices. Having said that, I don't believe it has an SD slot, so the issue of real FS support won't arise.

Categories: Offsite Blogs

The team: Updates for November 3, 2016

Planet Haskell - Thu, 11/03/2016 - 9:00am

The following changes have been made since September:

Categories: Offsite Blogs

Michael Snoyman: Designing APIs for Extensibility

Planet Haskell - Wed, 11/02/2016 - 6:00pm

This is an old bit of content that I wrote, and I'm relocating to this blog for posterity. I've actually been using this technique in practice to a large extent over the past few libraries with settings I've written, and overall like it. But it's definitely opinionated, your mileage may vary.

Every time you make a breaking change in your API, it means that- potentially- one or more of your users will need to change his/her code to adapt. Even if this update is trivial, it adds friction to the code maintenance process. On the other hand, we don't want to be constrained by bad design choices early on in a project, and sometimes a breaking API change is the best option.

The point of this document, however, is to give you a third option: design your APIs from the outset to be extensible. There are common techniques employed in the Haskell world to make APIs that are resilient to changing feature-sets, and by employing them early on in your design process, you can hopefully avoid the painful choices between a better API and happy users.

Almost all techniques start with implementation hiding. Guidelines here are simple: don't expose anything non-public. For example, if you write a number of helper functions, you may not want to start off by exposing them, since you're then telling users that these are good, stable functions to be relied upon. Instead, use explicit export lists on your modules and only include functions that are intended for public consumption.

More important- and more tricky- than functions are data constructors. In many cases, you want to avoid exposing the internals of your data types to users, to allow you to expand on them in the future. A common use case for this is some kind of a data type providing configuration information. Consider that you're going to communicate with some web services, so you write up the following API:

module MyAPI ( Settings (..) , makeAPICall ) where data Settings = Settings { apiKey :: Text , hostName :: Text } makeAPICall :: Settings -> Foo -> IO Bar

The way your users will access this will be something like:

makeAPICall Settings { apiKey = myAPIKey , hostName = "" } myFoo

Now suppose a user points out that, in some cases, the standard port 80 is not used for the API call. So you add a new field port :: Int to your Settings constructor. This will break your user's code, since the port field will not be set.

Instead, a more robust way of specifying your API will look like:

module MyAPI ( Settings , mkSettings , setHostName , makeAPICall ) where data Settings = Settings { apiKey :: Text , hostName :: Text } -- | Create a @Settings@ value. Uses default value for host name. mkSettings :: Text -- ^ API Key -> Settings mkSettings key = Settings { apiKey = key , hostName = "" } setHostName :: Text -> Settings -> Settings setHostName hn s = s { hostName = hn } makeAPICall :: Settings -> Foo -> IO Bar

Now your user code will instead look like:

makeAPICall (mkSettings myAPIKey) myFoo

This has the following benefits:

  • The user is not bothered to fill in default values (in this case, the hostname).
  • Extending this API to allow for more fields in the future is trivial: add a new set* function. Internally, you'll add a field to Settings and set a default value in mkSettings.

One thing to note: please do not expose the field accessors directly. If you want to provide getter functions in addition to setters, write them explicitly, e.g.:

getHostName :: Settings -> Text getHostName = hostName

The reason for this is that by exposing field accessors, users will be able to write code such as:

(mkSettings myAPIKey) { hostName = "" }

This ties your hand for future internal improvements, since you are now required to keep a field of name hostName with type Text. By just using set and get functions, you can change your internal representation significantly and still provide a compatibility layer.

For those of you familiar with other languages: this is in fact quite similar to the approach taken in Java or C#. Just because Java does it doesn't mean it's wrong.

Note that this advice is different to, and intended to supersede, the settings type approach. Projects like Warp which previously used that settings type approach are currently migrating to this more extensible approach.

Also, while settings have been used here as a motivating example, the same advice applies to other contexts.

Internal modules

One downside of implementation hiding is that it can make it difficult for users to do things you didn't intend for them to do with your API. You can always add more functionality on demand, but the delay can be a major nuissance for users. A compromise solution in the Haskell community is to provide a .Internal module for your project which exports not-quite-public components. For example, in wai, the Response constructors are exposed in a Network.Wai.Internal module. Normally, users are supposed to use smart constructors like responseFile, but occasionally they'll want more fine-grained control.

Categories: Offsite Blogs

Douglas M. Auclair (geophf): October 2016 1HaskellADay Problems and Solutions

Planet Haskell - Tue, 11/01/2016 - 8:44am
Categories: Offsite Blogs

Functional Jobs: Senior Software Engineer (Haskell) at Front Row Education (Full-time)

Planet Haskell - Mon, 10/31/2016 - 7:48pm

Senior Software Engineer to join fast-growing education startup transforming the way 5+ million K-12 students learn Math and English.

What you tell your friends you do

“You know how teachers in public schools are always overworked and overstressed with 30 kids per classroom and never ending state tests? I make their lives possible and help their students make it pretty far in life”

What you really will be doing

Architect, design and develop new web applications, tools and distributed systems for the Front Row ecosystem in Haskell, Flow, PostgreSQL, Ansible and many others. You will get to work on your deliverable end-to-end, from the UX to the deployment logic

Mentor and support more junior developers in the organization

Create, improve and refine workflows and processes for delivering quality software on time and without incurring debt

Work closely with Front Row educators, product managers, customer support representatives and account executives to help the business move fast and efficiently through relentless automation.

How you will do this

You’re part of an agile, multidisciplinary team. You bring your own unique skill set to the table and collaborate with others to accomplish your team’s goals.

You prioritize your work with the team and its product owner, weighing both the business and technical value of each task.

You experiment, test, try, fail and learn all the time

You don’t do things just because they were always done that way, you bring your experience and expertise with you and help the team make the best decisions

What have we worked on in the last quarter

We have rewritten our business logic to be decoupled from the Common Core math standards, supporting US state-specific standards and international math systems

Prototyped and tested a High School Math MVP product in classrooms

Changed assigning Math and English to a work queue metaphor across all products for conceptual product simplicity and consistency

Implemented a Selenium QA test suite 100% in Haskell

Released multiple open source libraries for generating automated unit test fixtures, integrating with AWS, parsing and visualizing Postgres logs and much more

Made numerous performance optimization passes on the system for supporting classrooms with weak Internet bandwidth


We’re an agile and lean small team of engineers, teachers and product people working on solving important problems in education. We hyper-focus on speeds, communication and prioritizing what matters to our millions of users.

  • You’re smart and can find a way to show us.
  • A track record of 5+ years of working in, or leading, teams that rapidly ship high quality web-based software that provides great value to users. Having done this at a startup a plus.
  • Awesome at a Functional Programming language: Haskell / Scala / Clojure / Erlang etc
  • Exceptional emotional intelligence and people skills
  • Organized and meticulous, but still able to focus on the big picture of the product
  • A ton of startup hustle: we're a fast-growing, VC-backed, Silicon Valley tech company that works hard to achieve the greatest impact we can.
  • Money, sweet
  • Medical, dental, vision
  • Incredible opportunity to grow, learn and build lifetime bonds with other passionate people who share your values
  • Food, catered lunch & dinner 4 days a week + snacks on snacks
  • Room for you to do things your way at our downtown San Francisco location right by the Powell Station BART, or you can work remotely from anywhere in the US, if that’s how you roll
  • Awesome monthly team events + smaller get-togethers (board game nights, trivia, etc)

Get information on how to apply for this position.

Categories: Offsite Blogs

Leon P Smith: Announcing Configurator-ng 0.0

Planet Haskell - Sun, 10/30/2016 - 10:24pm

I’m pleased to announce a preliminary release of configurator-ng, after spending time writing documentation for it at Hac Phi this past weekend. This release is for the slightly adventurous, but I think that many, especially those who currently use configurator, will find this worthwhile.

This is a massively breaking fork of Bryan O’Sullivan’s configurator package. The configuration file syntax is almost entirely backwards compatible, and mostly forwards compatible as well. However, the application interface used to read configuration files is drastically different. The focus so far has been on a more expressive interface, an interface that is safer in the face of concurrency, and improved error messages. The README offers an overview of the goals and motivations behind this fork, and how it is attempting to satisfy those goals.

I consider this an alpha release, but I am using it in some of my projects and it should be of reasonable quality. The new interfaces I’ve created in the Data.Configurator.Parser and Data.Configurator.FromValue modules should be pretty stable, but I am planning on major breaking changes to the file (re)loading and change notification interfaces inherited from configurator.

Documentation is currently a little sparse, but the README and the preliminary haddocks I hope will be enough to get started. Please don’t hesitate to contact me, via email, IRC (lpsmith on freenode), or GitHub if you have any questions, comments, ideas, needs or problems.

Categories: Offsite Blogs

Roman Cheplyaka: Electoral vote distributions are polynomials

Planet Haskell - Fri, 10/28/2016 - 2:00pm

In his article Electoral vote distributions are Monoids, Gabriel Gonzalez poses and answers the following question based on 538’s data:

what would be Hillary’s chance of winning if each state’s probability of winning was truly independent of one another?

To answer the question, Gabriel devises a divide-and-conquer algorithm. He computes probability distributions over vote counts in subsets of all states and then combines them. He also observes that vote distributions form a monoid.

Here I want to share an algebraic perspective on vote counting and show why distributions form a monoid.

Let \(p_i\) be the probability of Hillary’s victory in state \(i\), and \(n_i\) be the number of electoral college votes for that state, where \(i=1,\ldots,N\), and \(N\) is the total number of states (and districts; see Gabriel’s post for details).

Then a vote distribution is a collection of probabilities \(q_k\) that Hillary will get exactly \(k\) votes:

\[ \newcommand{\p}[1]{\mathrm{Pr}\{#1\}} \begin{equation} q_k = \p{\text{number of votes for H.Clinton} = k},\;k=1,\ldots,\sum_{i=1}^N n_i. \end{equation} \]

Consider the following polynomial:

\[Q(x)=\prod_{i=1}^N\left(p_i x^{n_i}+(1-p_i)\right).\]

This is a product of \(N\) brackets, one for each state. If we expanded it, we would get \(2^N\) terms. Every such term takes either \(p_i x^{n_i}\) or \(1-p_i\) from each bracket and multiplies them. Every such term corresponds to a particular election outcome: if Hillary won in a particular state, take \(p_i x^{n_i}\) from the corresponding bracket; otherwise, take \(1-p_i\).

For example, if an election outcome means that Hillary won in states \(1,4,\ldots\) and lost in states \(2,3,\ldots\), then the corresponding term is

\[ p_1 x^{n_1}(1-p_2)(1-p_3)p_4 x^{n_4}\ldots=p_1(1-p_2)(1-p_3)p_4\ldots x^{n_1+n_4+\ldots}. \]

Notice that \(p_1(1-p_2)(1-p_3)p_4\ldots\) is exactly the probability of the outcome (under the independence assumption) and \(n_1+n_4+\ldots\) is the number of votes for Hillary under that outcome.

Since the power of \(x\) in each term is the number of votes for Hillary, outcomes that result in the same number of votes, say \(k\), correspond to like terms. If we combine them, their probabilities (terms’ coefficients) will add up. To what? To \(q_k\), the total probability of Hillary getting \(k\) votes.


\[Q(x) = \sum_{k}q_kx^k.\]

Deriving the final vote distribution \(q_k\) from \(p_i\) and \(n_i\) is just expanding and reducing \(Q(x)\) from \(\prod_{i=1}^N\left(p_i x^{n_i}+(1-p_i)\right)\) to \(\sum_{k}q_kx^k\).

As Gabriel notes, doing this in the direct way would be inefficient. His divide-and-conquer approach directly translates to expanding \(Q(x)\): divide all brackets into two groups, recursively expand the groups, combine the results.

Under this formulation, it becomes obvious that vote distributions form a proper monoid: it is just a monoid of polynomials under multiplication.

Categories: Offsite Blogs

Functional Jobs: Software Engineer (Haskell/Clojure) at Capital Match (Full-time)

Planet Haskell - Fri, 10/28/2016 - 3:15am


CAPITAL MATCH is a leading marketplace lending and invoice financing platform in Singapore. Our in-house platform, mostly developed in Haskell, has in the last year seen more than USD 15 million business loans processed with a strong monthly growth (current rate of USD 1.5-2.5 million monthly). We are also eyeing expansion into new geographies and product categories. Very exciting times!

We have just secured another funding round to build a world-class technology as the key business differentiator. The key components include credit risk engine, seamless banking integration and end-to-end product automation from loan origination to debt collection.


We are looking to hire a software engineer with a minimum of 2-3 years coding experience.

The candidate should have been involved in a development of multiple web-based products from scratch. He should be interested in all aspects of the creation, growth and operations of a secure web-based platform: front-to-back features development, distributed deployment and automation in the cloud, build and test automation etc.

Background in fintech and especially lending / invoice financing space would be a great advantage.


Our platform is primarily developed in Haskell with an Om/ClojureScript frontend. We are expecting our candidate to have experience working with a functional programming language e.g. Haskell/Scala/OCaml/F#/Clojure/Lisp/Erlang.

Deployment and production is managed with Docker containers using standard cloud infrastructure so familiarity with Linux systems, command-line environment and cloud-based deployment is mandatory. Minimum exposure to and understanding of XP practices (TDD, CI, Emergent Design, Refactoring, Peer review and programming, Continuous improvement) is expected.

We are looking for candidates that are living in or are willing to relocate to Singapore.


We offer a combination of salary and equity depending on experience and skills of the candidate.

Most expats who relocate to Singapore do not have to pay their home country taxes and the local tax rate in Singapore is more or less 5% (effective on the proposed salary range).

Visa sponsorship will be provided.

Singapore is a great place to live, a vibrant city rich with diverse cultures, a very strong financial sector and a central location in Southeast Asia.

Get information on how to apply for this position.

Categories: Offsite Blogs

Gabriel Gonzalez: Electoral vote distributions are Monoids

Planet Haskell - Thu, 10/27/2016 - 7:50am

I'm a political junkie and I spend way too much time following the polling results on FiveThirtyEight's election forecast.

A couple of days ago I was surprised that FiveThirtyEight gave Trump a 13.7% chance of winning, which seemed too high to be consistent with the state-by-state breakdowns. After reading their methodology I learned that this was due to them not assuming that state outcomes were independent. In other words, if one swing state breaks for Trump this might increase the likelihood that other swing states also break for Trump.

However, I still wanted to do the exercise to ask: what would be Hillary's chance of winning if each state's probability of winning was truly independent of one another? Let's write a program to find out!

Raw data

A couple of days ago (2016-10-24) I collected the state-by-state data from FiveThirtyEight's website (by hand) and recorded:

  • the name of the state
  • the chance that Hillary Clinton would win the state
  • the number of electoral college votes for that state

I recorded this data as a list of 3-tuples:

probabilities :: [(String, Double, Int)]
probabilities =
[ ("Alabama" , 0.003, 9)
, ("Alaska" , 0.300, 3)
, ("Arizona" , 0.529, 11)
, ("Arkansas" , 0.012, 6)
, ("California" , 0.999, 55)
, ("Colorado" , 0.889, 9)
, ("Connecticut" , 0.977, 7)
, ("Delaware" , 0.948, 3)
, ("District of Columbia", 0.999, 3)
, ("Florida" , 0.731, 29)
, ("Georgia" , 0.259, 16)
, ("Hawaii" , 0.996, 4)
, ("Idaho" , 0.026, 4)
, ("Illinois" , 0.994, 20)
, ("Indiana" , 0.121, 11)
, ("Iowa" , 0.491, 6)
, ("Kansas" , 0.089, 6)
, ("Kentucky" , 0.042, 8)
, ("Louisiana" , 0.009, 8)
, ("Maine" , 0.852, 2)
, ("Maine - 1" , 0.944, 1)
, ("Maine - 2" , 0.517, 1)
, ("Maryland" , 0.999, 10)
, ("Massachussetts" , 0.998, 11)
, ("Michigan" , 0.929, 16)
, ("Minnesota" , 0.886, 10)
, ("Mississippi" , 0.034, 6)
, ("Missouri" , 0.168, 10)
, ("Montana" , 0.119, 3)
, ("Nebraska" , 0.040, 2)
, ("Nebraska - 1" , 0.154, 1)
, ("Nebraska - 2" , 0.513, 1)
, ("Nebraska - 3" , 0.014, 1)
, ("Nevada" , 0.703, 6)
, ("New Hampshire" , 0.868, 4)
, ("New Jersey" , 0.981, 14)
, ("New Mexico" , 0.941, 5)
, ("New York" , 0.998, 29)
, ("North Carolina" , 0.689, 15)
, ("North Dakota" , 0.070, 3)
, ("Oklahoma" , 0.002, 7)
, ("Ohio" , 0.563, 18)
, ("Oregon" , 0.957, 7)
, ("Pennsylvania" , 0.880, 20)
, ("Rhode Island" , 0.974, 4)
, ("South Carolina" , 0.086, 9)
, ("South Dakota" , 0.117, 3)
, ("Tennessee" , 0.025, 11)
, ("Texas" , 0.166, 38)
, ("Utah" , 0.067, 6)
, ("Vermont" , 0.984, 3)
, ("Virginia" , 0.943, 13)
, ("Washington" , 0.975, 12)
, ("West Virginia" , 0.006, 5)
, ("Wisconsin" , 0.896, 10)
, ("Wyoming" , 0.021, 3)

Note that some states (like Maine) apportion electoral votes in a weird way:

probabilities :: [(String, Double, Int)]
probabilities =
, ("Maine" , 0.852, 2)
, ("Maine - 1" , 0.944, 1)
, ("Maine - 2" , 0.517, 1)

Maine apportions two of its electoral votes based on a state-wide vote (i.e. "Maine" in the above list) and then two further electoral votes are apportioned based on two districts (i.e. "Maine - 1" and "Maine - 2". FiveThirtyEight computes the probabilities for each subset of electoral votes, so we just record them separately.

Combinatorial explosion

So how might we compute Hillary's chances of winnings assuming the independence of each state's outcome?

One naïve approach would be to loop through all possible electoral outcomes and compute the probability and electoral vote for each outcome. Unfortunately, that's not very efficient since the number of possible outcomes doubles with each additional entry in the list:

>>> 2 ^ length probabilities

... or approximately 7.2 * 10^16 outcomes. Even if I only spent a single CPU cycle to compute each outcome (which is unrealistic) on a 2.5 GHz processor that would take almost a year to compute them all. The election is only a couple of weeks away so I don't have that kind of time or computing power!


Fortunately, we can do much better than that! We can efficiently solve this using a simple "divide-and-conquer" approach where we subdivide the large problem into smaller problems until the solution is trivial.

The central data structure we'll use is a probability distribution which we'll represent as a Vector of Doubles:

import Data.Vector.Unboxed (Vector)

newtype Distribution = Distribution (Vector Double)
deriving (Show)

This Vector will always have 539 elements, one element per possible final electoral vote count that Hillary might get. Each element is a Double representing the probability of that corresponding electoral vote count. We will maintain an invariant that all the probabilities (i.e. elements of the Vector) must sum to 1.

For example, if the Distribution were:

[1, 0, 0, 0, 0 ... ]

... that would represent a 100% chance of Hillary getting 0 electoral votes and a 0% chance of any other outcome. Similarly, if the Distribution were:

[0, 0.5, 0, 0.5, 0, 0, 0 ... ]

... then that would represent a 50% chance of Hillary getting 1 electoral vote and a 50% chance of Hillary getting 3 electoral votes.

In order to simplify the problem we need to subdivide the problem into smaller problems. For example, if I want to compute the final electoral vote probability distribution for all 50 states perhaps we can break that down into two smaller problems:

  • Split the 50 states into two sub-groups of 25 states each
  • Compute an electoral vote probability distribution for each sub-group of 25 states
  • Combine probability distributions for each sub-group into the final distribution

In order to do that, I need to define a function that combines two smaller distributions into a larger distribution:

import qualified Data.Vector.Unboxed

combine :: Distribution -> Distribution -> Distribution
combine (Distribution xs) (Distribution ys) = Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* Data.Vector.Unboxed.unsafeIndex ys (i - j)

The combine function takes two input distributions named xs and ys and generates a new distribution named zs. To compute the probability of getting i electoral votes in our composite distribution, we just add up all the different ways we can get i electoral votes from the two sub-distributions.

For example, to compute the probability of getting 4 electoral votes for the entire group, we add up the probabilities for the following 5 outcomes:

  • We get 0 votes from our 1st group and 4 votes from our 2nd group
  • We get 1 votes from our 1st group and 3 votes from our 2nd group
  • We get 2 votes from our 1st group and 2 votes from our 2nd group
  • We get 3 votes from our 1st group and 1 votes from our 2nd group
  • We get 4 votes from our 1st group and 0 votes from our 2nd group

The probabilityOfEachOutcome function computes the probability of each one of these outcomes and then the totalProbability function sums them all up to compute the total probability of getting i electoral votes.

We can also define an empty distribution representing the probability distribution of electoral votes given zero states:

empty :: Distribution
empty = Distribution (Data.Vector.Unboxed.generate 539 makeElement)
makeElement 0 = 1
makeElement _ = 0

This distribution says that given zero states you have a 100% chance of getting zero electoral college votes and 0% chance of any other outcome. This empty distribution will come in handy later on.

Divide and conquer

There's no limit to how many times we can subdivide the problem. In the extreme case we can sub-divide the problem down to individual states (or districts for weird states like Maine and Nebraska):

  • subdivide our problem into 56 sub-groups (one group per state or district)
  • compute the probability distribution for each sub-group, which is trivial
  • combine all the probability distributions to retrieve the final result

In fact, this extreme solution is surprisingly efficient!

All we're missing is a function that converts each entry in our original probabilities list into a Distribution:

toDistribution :: (String, Double, Int) -> Distribution
toDistribution (_, probability, votes) =
Distribution (Data.Vector.Unboxed.generate 539 makeElement)
makeElement 0 = 1 - probability
makeElement i | i == votes = probability
makeElement _ = 0

This says that if our probability distribution for a single state should have two possible outcomes:

  • Hillary clinton has probability x of winning n votes for this state
  • Hillary clinton has probability 1 - x of winning 0 votes for this state
  • Hillary clinton has 0% probability of any other outcome for this state

Let's test this out on a couple of states:

>>> toDistribution ("Alaska" , 0.300, 3)
Distribution [0.7,0.0,0.0,0.3,0.0,0.0,...
>>> toDistribution ("North Dakota", 0.070, 3)
Distribution [0.9299999999999999,0.0,0.0,7.0e-2,0.0...

This says that:

  • Alaska has a 30% chance of giving Clinton 3 votes and 70% chance of 0 votes
  • North Dakota has a 7% chance of giving Clinton 3 votes and a 93% chance of 0 votes

We can also verify that combine works correctly by combining the electoral vote distributions of both states. We expect the new distribution to be:

  • 2.1% chance of 6 votes (the probability of winning both states)
  • 65.1% chance of 0 votes (the probability of losing both states)
  • 32.8% chance of 3 votes (the probability of winning just one of the two states)

... and this is in fact what we get:

>>> let alaska = toDistribution ("Alaska" , 0.300, 3)
>>> let northDakota = toDistribution ("North Dakota", 0.070, 3)
>>> combine alaska northDakota
Distribution [0.6509999999999999,0.0,0.0,0.32799999999999996,0.0,0.0,2.1e-2,0.0,...Final result

To compute the total probability of winning, we just transform each element of the list to the corresponding distribution:

distributions :: [Distribution]
distributions = map toDistribution probabilities

... then we reduce the list to a single value repeatedly applying the combine function, falling back on the empty distribution if the entire list is empty:

import qualified Data.List

distribution :: Distribution
distribution = Data.List.foldl' combine empty distributions

... and if we want to get Clinton's chances of winning, we just add up the probabilities for all outcomes greater than or equal to 270 electoral college votes:

chanceOfClintonVictory :: Double
chanceOfClintonVictory =
Data.Vector.Unboxed.sum (Data.Vector.Unboxed.drop 270 xs)
Distribution xs = distribution

main :: IO ()
main = print chanceOfClintonVictory

If we compile and run this program we get the final result:

$ stack --resolver=lts-7.4 build vector
$ stack --resolver=lts-7.4 ghc -- -O2 result.hs
$ ./result

In other words, Clinton has a 99.3% chance of winning if each state's outcome is independent of every other outcome. This is significantly higher than the probability estimated by FiveThirtyEight at that time: 86.3%.

These results differ for the same reason I noted above: FiveThirtyEight assumes that state outcomes are not necessarily independent and that a Trump in one state could correlate with Trump wins in other states. This possibility of correlated victories favors the person who is behind in the race.

As a sanity check, we can also verify that the final probability distribution has probabilities that add up to approximately 1:

>>> let Distribution xs = distribution
>>> Data.Vector.Unboxed.sum xs

Exercise: Expand on this program to plot the probability distribution


Our program is also efficient, running in 30 milliseconds:

$ bench ./result
benchmarking ./result
time 30.33 ms (29.42 ms .. 31.16 ms)
0.998 R² (0.997 R² .. 1.000 R²)
mean 29.43 ms (29.13 ms .. 29.81 ms)
std dev 710.6 μs (506.7 μs .. 992.6 μs)

This is a significant improvement over a year's worth of running time.

We could even speed this up further using parallelism. Thanks to our divide and conquer approach we can subdivide this problem among up to 53 CPUs to accelerate the solution. However, after a certain point the overhead of splitting up the work might outweigh the benefits of parallelism.


People more familiar with Haskell will recognize that this solution fits cleanly into a standard Haskell interface known as the Monoid type class. In fact, many divide-and-conquer solutions tend to be Monoids of some sort.

The Monoid typeclass is defined as:

class Monoid m where
mappend :: m -> m -> m

mempty :: m

-- An infix operator that is a synonym for `mappend`
(<>) :: Monoid m => m -> m -> m
x <> y = mappend x y

... and the Monoid class has three rules that every implementation must obey, which are known as the "Monoid laws".

The first rule is that mappend (or the equivalent (<>) operator) must be associative:

x <> (y <> z) = (x <> y) <> z

The second and third rules are that mempty must be the "identity" of mappend, meaning that mempty does nothing when combined with other values:

mempty <> x = x

x <> mempty = x

A simple example of a Monoid is integers under addition, which we can implement like this:

instance Monoid Integer where
mappend = (+)
mempty = 0

... and this implementation satisfies the Monoid laws thanks to the laws of addition:

(x + y) + z = x + (y + z)

0 + x = x

x + 0 = x

However, Distributions are Monoids, too! Our combine and empty definitions both have the correct types to implement the mappend and mempty methods of the Monoid typeclass, respectively:

instance Monoid Distribution where
mappend = combine

mempty = empty

Both mappend and mempty for Distributions satisfy the Monoid laws:

  • mappend is associative (Proof omitted)
  • mempty is the identity of mappend

We can prove the identity law using the following rules for how Vectors behave:

-- These rules assume that all vectors involved have 539 elements

-- If you generate a vector by just indexing into another vector, you just get back
-- the other vector
Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs

-- If you index into a vector generated by a function, that's equivalent to calling
-- that function
Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i

Equipped with those rules, we can then prove that mappend xs mempty = xs

mapppend (Distribution xs) mempty

-- mappend = combine
= combine (Distribution xs) mempty

-- Definition of `mempty`
= combine (Distribution xs) (Distribution ys)
ys = Data.Vector.Unboxed.generate 539 makeElement
makeElement 0 = 1
makeElement _ = 0

-- Definition of `combine`
= Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* Data.Vector.Unboxed.unsafeIndex ys (i - j)

ys = Data.Vector.Unboxed.generate 539 makeElement
makeElement 0 = 1
makeElement _ = 0

-- Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i
= Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* makeElement (i - j)

makeElement 0 = 1
makeElement _ = 0

-- Case analysis on `j`
= Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j
| j == i =
Data.Vector.Unboxed.unsafeIndex xs j
* 1 -- makeElement (i - j) = makeElement 0 = 1
| otherwise =
Data.Vector.Unboxed.unsafeIndex xs j
* 0 -- makeElement (i - j) = 0

-- x * 1 = x
-- y * 0 = 0
= Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j
| j == i = Data.Vector.Unboxed.unsafeIndex xs j
| otherwise = 0

-- Informally: "Sum of a vector with one non-zero element is just that element"
= Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i = Data.Vector.Unboxed.unsafeIndex xs i

-- Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs
= Distribution xs

Exercise: Prove the associativity law for combine


I hope people find this an interesting example of how you can apply mathematical design principles (in this case: Monoids) in service of simplifying and speeding up programming problems.

If you would like to test this program out yourself the complete program is provided below:

import Data.Vector.Unboxed (Vector)

import qualified Data.List
import qualified Data.Vector.Unboxed

probabilities :: [(String, Double, Int)]
probabilities =
[ ("Alabama" , 0.003, 9)
, ("Alaska" , 0.300, 3)
, ("Arizona" , 0.529, 11)
, ("Arkansas" , 0.012, 6)
, ("California" , 0.999, 55)
, ("Colorado" , 0.889, 9)
, ("Connecticut" , 0.977, 7)
, ("Delaware" , 0.948, 3)
, ("District of Columbia", 0.999, 3)
, ("Florida" , 0.731, 29)
, ("Georgia" , 0.259, 16)
, ("Hawaii" , 0.996, 4)
, ("Idaho" , 0.026, 4)
, ("Illinois" , 0.994, 20)
, ("Indiana" , 0.121, 11)
, ("Iowa" , 0.491, 6)
, ("Kansas" , 0.089, 6)
, ("Kentucky" , 0.042, 8)
, ("Louisiana" , 0.009, 8)
, ("Maine" , 0.852, 2)
, ("Maine - 1" , 0.944, 1)
, ("Maine - 2" , 0.517, 1)
, ("Maryland" , 0.999, 10)
, ("Massachussetts" , 0.998, 11)
, ("Michigan" , 0.929, 16)
, ("Minnesota" , 0.886, 10)
, ("Mississippi" , 0.034, 6)
, ("Missouri" , 0.168, 10)
, ("Montana" , 0.119, 3)
, ("Nebraska" , 0.040, 2)
, ("Nebraska - 1" , 0.154, 1)
, ("Nebraska - 2" , 0.513, 1)
, ("Nebraska - 3" , 0.014, 1)
, ("Nevada" , 0.703, 6)
, ("New Hampshire" , 0.868, 4)
, ("New Jersey" , 0.981, 14)
, ("New Mexico" , 0.941, 5)
, ("New York" , 0.998, 29)
, ("North Carolina" , 0.689, 15)
, ("North Dakota" , 0.070, 3)
, ("Oklahoma" , 0.002, 7)
, ("Ohio" , 0.563, 18)
, ("Oregon" , 0.957, 7)
, ("Pennsylvania" , 0.880, 20)
, ("Rhode Island" , 0.974, 4)
, ("South Carolina" , 0.086, 9)
, ("South Dakota" , 0.117, 3)
, ("Tennessee" , 0.025, 11)
, ("Texas" , 0.166, 38)
, ("Utah" , 0.067, 6)
, ("Vermont" , 0.984, 3)
, ("Virginia" , 0.943, 13)
, ("Washington" , 0.975, 12)
, ("West Virginia" , 0.006, 5)
, ("Wisconsin" , 0.896, 10)
, ("Wyoming" , 0.021, 3)

newtype Distribution = Distribution { getDistribution :: Vector Double }
deriving (Show)

combine :: Distribution -> Distribution -> Distribution
combine (Distribution xs) (Distribution ys) = Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* Data.Vector.Unboxed.unsafeIndex ys (i - j)

empty :: Distribution
empty = Distribution (Data.Vector.Unboxed.generate 539 makeElement)
makeElement 0 = 1
makeElement _ = 0

instance Monoid Distribution where
mappend = combine

mempty = empty

toDistribution :: (String, Double, Int) -> Distribution
toDistribution (_, probability, votes) =
Distribution (Data.Vector.Unboxed.generate 539 makeElement)
makeElement 0 = 1 - probability
makeElement i | i == votes = probability
makeElement _ = 0

distributions :: [Distribution]
distributions = map toDistribution probabilities

distribution :: Distribution
distribution = mconcat distributions

chanceOfClintonVictory :: Double
chanceOfClintonVictory =
Data.Vector.Unboxed.sum (Data.Vector.Unboxed.drop 270 xs)
Distribution xs = distribution

main :: IO ()
main = print chanceOfClintonVictory
Categories: Offsite Blogs

Functional Jobs: Clojure Engineer at ROKT (Full-time)

Planet Haskell - Tue, 10/25/2016 - 11:58pm

ROKT is hiring thoughtful, talented functional programmers, at all levels, to expand our Clojure team in Sydney, Australia. (We're looking for people who already have the right to work in Australia, please.)

ROKT is a successful startup with a transaction marketing platform used by some of the world's largest ecommerce sites. Our Sydney-based engineering team supports a business that is growing rapidly around the world.

Our Clojure engineers are responsible for ROKT's "Data Platform", a web interface for our sales teams, our operations team, and our customers to extract and upload the data that drives our customers' businesses and our own. We write Clojure on the server-side, and a ClojureScript single-page application on the frontend.

We don't have a Hadoop-based neural net diligently organising our customer data into the world's most efficiently balanced red-black tree (good news: we won't ask you to write one in an interview) — instead, we try to spend our time carefully building the simplest thing that'll do what the business needs done. We're looking for programmers who can help us build simple, robust systems — and we think that means writing in a very functional style — whether that involves hooking some CV-enhancing buzzword technology on the side or not.

If you have professional Clojure experience, that's excellent, we'd like to hear about it. But we don't have a big matrix of exacting checkboxes to measure you against, so if your Clojure isn't fluent yet, we'll be happy to hear how you've been writing functional code in whatever language you're most comfortable in, whether it be Haskell or JavaScript, Common Lisp or Scala. We have the luxury of building out a solid team of thoughtful developers — no "get me a resource with exactly X years of experience in technology Y, stat!"

Get information on how to apply for this position.

Categories: Offsite Blogs

Joachim Breitner: Showcasing Applicative

Planet Haskell - Tue, 10/25/2016 - 10:00pm

My plan for this week’s lecture of the CIS 194 Haskell course at the University of Pennsylvania is to dwell a bit on the concept of Functor, Applicative and Monad, and to highlight the value of the Applicative abstraction.

I quite like the example that I came up with, so I want to share it here. In the interest of long-term archival and stand-alone presentation, I include all the material in this post.1


In case you want to follow along, start with these imports:

import Data.Char import Data.Maybe import Data.List import System.Environment import System.IO import System.Exit The parser

The starting point for this exercise is a fairly standard parser-combinator monad, which happens to be the result of the student’s homework from last week:

newtype Parser a = P (String -> Maybe (a, String)) runParser :: Parser t -> String -> Maybe (t, String) runParser (P p) = p parse :: Parser a -> String -> Maybe a parse p input = case runParser p input of Just (result, "") -> Just result _ -> Nothing -- handles both no result and leftover input noParserP :: Parser a noParserP = P (\_ -> Nothing) pureParserP :: a -> Parser a pureParserP x = P (\input -> Just (x,input)) instance Functor Parser where fmap f p = P $ \input -> do (x, rest) <- runParser p input return (f x, rest) instance Applicative Parser where pure = pureParserP p1 <*> p2 = P $ \input -> do (f, rest1) <- runParser p1 input (x, rest2) <- runParser p2 rest1 return (f x, rest2) instance Monad Parser where return = pure p1 >>= k = P $ \input -> do (x, rest1) <- runParser p1 input runParser (k x) rest1 anyCharP :: Parser Char anyCharP = P $ \input -> case input of (c:rest) -> Just (c, rest) [] -> Nothing charP :: Char -> Parser () charP c = do c' <- anyCharP if c == c' then return () else noParserP anyCharButP :: Char -> Parser Char anyCharButP c = do c' <- anyCharP if c /= c' then return c' else noParserP letterOrDigitP :: Parser Char letterOrDigitP = do c <- anyCharP if isAlphaNum c then return c else noParserP orElseP :: Parser a -> Parser a -> Parser a orElseP p1 p2 = P $ \input -> case runParser p1 input of Just r -> Just r Nothing -> runParser p2 input manyP :: Parser a -> Parser [a] manyP p = (pure (:) <*> p <*> manyP p) `orElseP` pure [] many1P :: Parser a -> Parser [a] many1P p = pure (:) <*> p <*> manyP p sepByP :: Parser a -> Parser () -> Parser [a] sepByP p1 p2 = (pure (:) <*> p1 <*> (manyP (p2 *> p1))) `orElseP` pure []

A parser using this library for, for example, CSV files could take this form:

parseCSVP :: Parser [[String]] parseCSVP = manyP parseLine where parseLine = parseCell `sepByP` charP ',' <* charP '\n' parseCell = do charP '"' content <- manyP (anyCharButP '"') charP '"' return content We want EBNF

Often when we write a parser for a file format, we might also want to have a formal specification of the format. A common form for such a specification is EBNF. This might look as follows, for a CSV file:

cell = '"', {not-quote}, '"'; line = (cell, {',', cell} | ''), newline; csv = {line};

It is straightforward to create a Haskell data type to represent an EBNF syntax description. Here is a simple EBNF library (data type and pretty-printer) for your convenience:

data RHS = Terminal String | NonTerminal String | Choice RHS RHS | Sequence RHS RHS | Optional RHS | Repetition RHS deriving (Show, Eq) ppRHS :: RHS -> String ppRHS = go 0 where go _ (Terminal s) = surround "'" "'" $ concatMap quote s go _ (NonTerminal s) = s go a (Choice x1 x2) = p a 1 $ go 1 x1 ++ " | " ++ go 1 x2 go a (Sequence x1 x2) = p a 2 $ go 2 x1 ++ ", " ++ go 2 x2 go _ (Optional x) = surround "[" "]" $ go 0 x go _ (Repetition x) = surround "{" "}" $ go 0 x surround c1 c2 x = c1 ++ x ++ c2 p a n | a > n = surround "(" ")" | otherwise = id quote '\'' = "\\'" quote '\\' = "\\\\" quote c = [c] type Production = (String, RHS) type BNF = [Production] ppBNF :: BNF -> String ppBNF = unlines . map (\(i,rhs) -> i ++ " = " ++ ppRHS rhs ++ ";") Code to produce EBNF

We had a good time writing combinators that create complex parsers from primitive pieces. Let us do the same for EBNF grammars. We could simply work on the RHS type directly, but we can do something more nifty: We create a data type that keeps track, via a phantom type parameter, of what Haskell type the given EBNF syntax is the specification:

newtype Grammar a = G RHS ppGrammar :: Grammar a -> String ppGrammar (G rhs) = ppRHS rhs

So a value of type Grammar t is a description of the textual representation of the Haskell type t.

Here is one simple example:

anyCharG :: Grammar Char anyCharG = G (NonTerminal "char")

Here is another one. This one does not describe any interesting Haskell type, but is useful when spelling out the special characters in the syntax described by the grammar:

charG :: Char -> Grammar () charG c = G (Terminal [c])

A combinator that creates new grammar from two existing grammars:

orElseG :: Grammar a -> Grammar a -> Grammar a orElseG (G rhs1) (G rhs2) = G (Choice rhs1 rhs2)

We want the convenience of our well-known type classes in order to combine these values some more:

instance Functor Grammar where fmap _ (G rhs) = G rhs instance Applicative Grammar where pure x = G (Terminal "") (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)

Note how the Functor instance does not actually use the function. How should it? There are no values inside a Grammar!

We cannot define a Monad instance for Grammar: We would start with (G rhs1) >>= k = …, but there is simply no way of getting a value of type a that we can feed to k. So we will do without a Monad instance. This is interesting, and we will come back to that later.

Like with the parser, we can now begin to build on the primitive example to build more complicated combinators:

manyG :: Grammar a -> Grammar [a] manyG p = (pure (:) <*> p <*> manyG p) `orElseG` pure [] many1G :: Grammar a -> Grammar [a] many1G p = pure (:) <*> p <*> manyG p sepByG :: Grammar a -> Grammar () -> Grammar [a] sepByG p1 p2 = ((:) <$> p1 <*> (manyG (p2 *> p1))) `orElseG` pure []

Let us run a small example:

dottedWordsG :: Grammar [String] dottedWordsG = many1G (manyG anyCharG <* charG '.') *Main> putStrLn $ ppGrammar dottedWordsG '', ('', char, ('', char, ('', char, ('', char, ('', char, ('', …

Oh my, that is not good. Looks like the recursion in manyG does not work well, so we need to avoid that. But anyways we want to be explicit in the EBNF grammars about where something can be repeated, so let us just make many a primitive:

manyG :: Grammar a -> Grammar [a] manyG (G rhs) = G (Repetition rhs)

With this definition, we already get a simple grammar for dottedWordsG:

*Main> putStrLn $ ppGrammar dottedWordsG '', {char}, '.', {{char}, '.'}

This already looks like a proper EBNF grammar. One thing that is not nice about it is that there is an empty string ('') in a sequence (…,…). We do not want that.

Why is it there in the first place? Because our Applicative instance is not lawful! Remember that pure id <*> g == g should hold. One way to achieve that is to improve the Applicative instance to optimize this case away:

instance Applicative Grammar where pure x = G (Terminal "") G (Terminal "") <*> G rhs2 = G rhs2 G rhs1 <*> G (Terminal "") = G rhs1 (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2) ``` Now we get what we want: *Main> putStrLn $ ppGrammar dottedWordsG {char}, '.', {{char}, '.'}

Remember our parser for CSV files above? Let me repeat it here, this time using only Applicative combinators, i.e. avoiding (>>=), (>>), return and do-notation:

parseCSVP :: Parser [[String]] parseCSVP = manyP parseLine where parseLine = parseCell `sepByP` charG ',' <* charP '\n' parseCell = charP '"' *> manyP (anyCharButP '"') <* charP '"'

And now we try to rewrite the code to produce Grammar instead of Parser. This is straightforward with the exception of anyCharButP. The parser code for that inherently monadic, and we just do not have a monad instance. So we work around the issue by making that a “primitive” grammar, i.e. introducing a non-terminal in the EBNF without a production rule – pretty much like we did for anyCharG:

primitiveG :: String -> Grammar a primitiveG s = G (NonTerminal s) parseCSVG :: Grammar [[String]] parseCSVG = manyG parseLine where parseLine = parseCell `sepByG` charG ',' <* charG '\n' parseCell = charG '"' *> manyG (primitiveG "not-quote") <* charG '"'

Of course the names parse… are not quite right any more, but let us just leave that for now.

Here is the result:

*Main> putStrLn $ ppGrammar parseCSVG {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), ' '}

The line break is weird. We do not really want newlines in the grammar. So let us make that primitive as well, and replace charG '\n' with newlineG:

newlineG :: Grammar () newlineG = primitiveG "newline"

Now we get

*Main> putStrLn $ ppGrammar parseCSVG {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), newline}

which is nice and correct, but still not quite the easily readable EBNF that we saw further up.

Code to produce EBNF, with productions

We currently let our grammars produce only the right-hand side of one EBNF production, but really, we want to produce a RHS that may refer to other productions. So let us change the type accordingly:

newtype Grammar a = G (BNF, RHS) runGrammer :: String -> Grammar a -> BNF runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)] ppGrammar :: String -> Grammar a -> String ppGrammar main g = ppBNF $ runGrammer main g

Now we have to adjust all our primitive combinators (but not the derived ones!):

charG :: Char -> Grammar () charG c = G ([], Terminal [c]) anyCharG :: Grammar Char anyCharG = G ([], NonTerminal "char") manyG :: Grammar a -> Grammar [a] manyG (G (prods, rhs)) = G (prods, Repetition rhs) mergeProds :: [Production] -> [Production] -> [Production] mergeProds prods1 prods2 = nub $ prods1 ++ prods2 orElseG :: Grammar a -> Grammar a -> Grammar a orElseG (G (prods1, rhs1)) (G (prods2, rhs2)) = G (mergeProds prods1 prods2, Choice rhs1 rhs2) instance Functor Grammar where fmap _ (G bnf) = G bnf instance Applicative Grammar where pure x = G ([], Terminal "") G (prods1, Terminal "") <*> G (prods2, rhs2) = G (mergeProds prods1 prods2, rhs2) G (prods1, rhs1) <*> G (prods2, Terminal "") = G (mergeProds prods1 prods2, rhs1) G (prods1, rhs1) <*> G (prods2, rhs2) = G (mergeProds prods1 prods2, Sequence rhs1 rhs2) primitiveG :: String -> Grammar a primitiveG s = G (NonTerminal s)

The use of nub when combining productions removes duplicates that might be used in different parts of the grammar. Not efficient, but good enough for now.

Did we gain anything? Not yet:

*Main> putStr $ ppGrammar "csv" (parseCSVG) csv = {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), newline};

But we can now introduce a function that lets us tell the system where to give names to a piece of grammar:

nonTerminal :: String -> Grammar a -> Grammar a nonTerminal name (G (prods, rhs)) = G (prods ++ [(name, rhs)], NonTerminal name)

Ample use of this in parseCSVG yields the desired result:

parseCSVG :: Grammar [[String]] parseCSVG = manyG parseLine where parseLine = nonTerminal "line" $ parseCell `sepByG` charG ',' <* newline parseCell = nonTerminal "cell" $ charG '"' *> manyG (primitiveG "not-quote") <* charG '" *Main> putStr $ ppGrammar "csv" (parseCSVG) cell = '"', {not-quote}, '"'; line = (cell, {',', cell} | ''), newline; csv = {line};

This is great!

Unifying parsing and grammar-generating

Note how simliar parseCSVG and parseCSVP are! Would it not be great if we could implement that functionality only once, and get both a parser and a grammar description out of it? This way, the two would never be out of sync!

And surely this must be possible. The tool to reach for is of course to define a type class that abstracts over the parts where Parser and Grammer differ. So we have to identify all functions that are primitive in one of the two worlds, and turn them into type class methods. This includes char and orElse. It includes many, too: Although manyP is not primitive, manyG is. It also includes nonTerminal, which does not exist in the world of parsers (yet), but we need it for the grammars.

The primitiveG function is tricky. We use it in grammars when the code that we might use while parsing is not expressible as a grammar. So the solution is to let it take two arguments: A String, when used as a descriptive non-terminal in a grammar, and a Parser a, used in the parsing code.

Finally, the type classes that we except, Applicative (and thus Functor), are added as constraints on our type class:

class Applicative f => Descr f where char :: Char -> f () many :: f a -> f [a] orElse :: f a -> f a -> f a primitive :: String -> Parser a -> f a nonTerminal :: String -> f a -> f a

The instances are easily written:

instance Descr Parser where char = charP many = manyP orElse = orElseP primitive _ p = p nonTerminal _ p = p instance Descr Grammar where char = charG many = manyG orElse = orElseG primitive s _ = primitiveG s nonTerminal s g = nonTerminal s g

And we can now take the derived definitions, of which so far we had two copies, and define them once and for all:

many1 :: Descr f => f a -> f [a] many1 p = pure (:) <*> p <*> many p anyChar :: Descr f => f Char anyChar = primitive "char" anyCharP dottedWords :: Descr f => f [String] dottedWords = many1 (many anyChar <* char '.') sepBy :: Descr f => f a -> f () -> f [a] sepBy p1 p2 = ((:) <$> p1 <*> (many (p2 *> p1))) `orElse` pure [] newline :: Descr f => f () newline = primitive "newline" (charP '\n')

And thus we now have our CSV parser/grammar generator:

parseCSV :: Descr f => f [[String]] parseCSV = many parseLine where parseLine = nonTerminal "line" $ parseCell `sepBy` char ',' <* newline parseCell = nonTerminal "cell" $ char '"' *> many (primitive "not-quote" (anyCharButP '"')) <* char '"'

We can now use this definition both to parse and to generate grammars:

*Main> putStr $ ppGrammar2 "csv" (parseCSV) cell = '"', {not-quote}, '"'; line = (cell, {',', cell} | ''), newline; csv = {line}; *Main> parse parseCSV "\"ab\",\"cd\"\n\"\",\"de\"\n\n" Just [["ab","cd"],["","de"],[]] The INI file parser and grammar

As a final exercise, let us transform the INI file parser into a combined thing. Here is the parser (another artifact of last week’s homework) again using applicative style2:

parseINIP :: Parser INIFile parseINIP = many1P parseSection where parseSection = (,) <$ charP '[' <*> parseIdent <* charP ']' <* charP '\n' <*> (catMaybes <$> manyP parseLine) parseIdent = many1P letterOrDigitP parseLine = parseDecl `orElseP` parseComment `orElseP` parseEmpty parseDecl = Just <$> ( (,) <*> parseIdent <* manyP (charP ' ') <* charP '=' <* manyP (charP ' ') <*> many1P (anyCharButP '\n') <* charP '\n') parseComment = Nothing <$ charP '#' <* many1P (anyCharButP '\n') <* charP '\n' parseEmpty = Nothing <$ charP '\n'

Transforming that to a generic description is quite straightforward. We use primitive again to wrap letterOrDigitP:

descrINI :: Descr f => f INIFile descrINI = many1 parseSection where parseSection = (,) <* char '[' <*> parseIdent <* char ']' <* newline <*> (catMaybes <$> many parseLine) parseIdent = many1 (primitive "alphanum" letterOrDigitP) parseLine = parseDecl `orElse` parseComment `orElse` parseEmpty parseDecl = Just <$> ( (,) <*> parseIdent <* many (char ' ') <* char '=' <* many (char ' ') <*> many1 (primitive "non-newline" (anyCharButP '\n')) <* newline) parseComment = Nothing <$ char '#' <* many1 (primitive "non-newline" (anyCharButP '\n')) <* newline parseEmpty = Nothing <$ newline

This yields this not very helpful grammar (abbreviated here):

*Main> putStr $ ppGrammar2 "ini" descrINI ini = '[', alphanum, {alphanum}, ']', newline, {alphanum, {alphanum}, {' '}…

But with a few uses of nonTerminal, we get something really nice:

descrINI :: Descr f => f INIFile descrINI = many1 parseSection where parseSection = nonTerminal "section" $ (,) <$ char '[' <*> parseIdent <* char ']' <* newline <*> (catMaybes <$> many parseLine) parseIdent = nonTerminal "identifier" $ many1 (primitive "alphanum" letterOrDigitP) parseLine = nonTerminal "line" $ parseDecl `orElse` parseComment `orElse` parseEmpty parseDecl = nonTerminal "declaration" $ Just <$> ( (,) <$> parseIdent <* spaces <* char '=' <* spaces <*> remainder) parseComment = nonTerminal "comment" $ Nothing <$ char '#' <* remainder remainder = nonTerminal "line-remainder" $ many1 (primitive "non-newline" (anyCharButP '\n')) <* newline parseEmpty = Nothing <$ newline spaces = nonTerminal "spaces" $ many (char ' ') *Main> putStr $ ppGrammar "ini" descrINI identifier = alphanum, {alphanum}; spaces = {' '}; line-remainder = non-newline, {non-newline}, newline; declaration = identifier, spaces, '=', spaces, line-remainder; comment = '#', line-remainder; line = declaration | comment | newline; section = '[', identifier, ']', newline, {line}; ini = section, {section}; Recursion (variant 1)

What if we want to write a parser/grammar-generator that is able to generate the following grammar, which describes terms that are additions and multiplications of natural numbers:

const = digit, {digit}; spaces = {' ' | newline}; atom = const | '(', spaces, expr, spaces, ')', spaces; mult = atom, {spaces, '*', spaces, atom}, spaces; plus = mult, {spaces, '+', spaces, mult}, spaces; expr = plus;

The production of expr is recursive (via plus, mult, atom). We have seen above that simply defining a Grammar a recursively does not go well.

One solution is to add a new combinator for explicit recursion, which replaces nonTerminal in the method:

class Applicative f => Descr f where … recNonTerminal :: String -> (f a -> f a) -> f a instance Descr Parser where … recNonTerminal _ p = let r = p r in r instance Descr Grammar where … recNonTerminal = recNonTerminalG recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a recNonTerminalG name f = let G (prods, rhs) = f (G ([], NonTerminal name)) in G (prods ++ [(name, rhs)], NonTerminal name) nonTerminal :: Descr f => String -> f a -> f a nonTerminal name p = recNonTerminal name (const p) runGrammer :: String -> Grammar a -> BNF runGrammer main (G (prods, NonTerminal nt)) | main == nt = prods runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]

The change in runGrammer avoids adding a pointless expr = expr production to the output.

This lets us define a parser/grammar-generator for the arithmetic expressions given above:

data Expr = Plus Expr Expr | Mult Expr Expr | Const Integer deriving Show mkPlus :: Expr -> [Expr] -> Expr mkPlus = foldl Plus mkMult :: Expr -> [Expr] -> Expr mkMult = foldl Mult parseExpr :: Descr f => f Expr parseExpr = recNonTerminal "expr" $ \ exp -> ePlus exp ePlus :: Descr f => f Expr -> f Expr ePlus exp = nonTerminal "plus" $ mkPlus <$> eMult exp <*> many (spaces *> char '+' *> spaces *> eMult exp) <* spaces eMult :: Descr f => f Expr -> f Expr eMult exp = nonTerminal "mult" $ mkPlus <$> eAtom exp <*> many (spaces *> char '*' *> spaces *> eAtom exp) <* spaces eAtom :: Descr f => f Expr -> f Expr eAtom exp = nonTerminal "atom" $ aConst `orElse` eParens exp aConst :: Descr f => f Expr aConst = nonTerminal "const" $ Const . read <$> many1 digit eParens :: Descr f => f a -> f a eParens inner = id <$ char '(' <* spaces <*> inner <* spaces <* char ')' <* spaces

And indeed, this works:

*Main> putStr $ ppGrammar "expr" parseExpr const = digit, {digit}; spaces = {' ' | newline}; atom = const | '(', spaces, expr, spaces, ')', spaces; mult = atom, {spaces, '*', spaces, atom}, spaces; plus = mult, {spaces, '+', spaces, mult}, spaces; expr = plus; Recursion (variant 2)

Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal and explicitly passing around the recursive call (i.e. the exp in the example). To implement that we have to adjust our Grammar type as follows:

newtype Grammar a = G ([String] -> (BNF, RHS))

The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:

nonTerminalG :: String -> (Grammar a) -> Grammar a nonTerminalG name (G g) = G $ \seen -> if name `elem` seen then ([], NonTerminal name) else let (prods, rhs) = g (name : seen) in (prods ++ [(name, rhs)], NonTerminal name)

After adjusting the other primitives of Grammar (including the Functor and Applicative instances, wich now again have nonTerminal) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:

parseExp :: Descr f => f Expr parseExp = nonTerminal "expr" $ ePlus ePlus :: Descr f => f Expr ePlus = nonTerminal "plus" $ mkPlus <$> eMult <*> many (spaces *> char '+' *> spaces *> eMult) <* spaces eMult :: Descr f => f Expr eMult = nonTerminal "mult" $ mkPlus <$> eAtom <*> many (spaces *> char '*' *> spaces *> eAtom) <* spaces eAtom :: Descr f => f Expr eAtom = nonTerminal "atom" $ aConst `orElse` eParens parseExp

Note that the recursion is only going to work if there is at least one call to nonTerminal somewhere around the recursive calls. We still cannot implement many as naively as above.


If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:

identifier = letter, {letter | digit | '-'}; spaces = {' ' | newline}; quoted-char = non-quote-or-backslash | '\\', '\\' | '\\', '\''; terminal = '\'', {quoted-char}, '\'', spaces; non-terminal = identifier, spaces; option = '[', spaces, rhs, spaces, ']', spaces; repetition = '{', spaces, rhs, spaces, '}', spaces; group = '(', spaces, rhs, spaces, ')', spaces; atom = terminal | non-terminal | option | repetition | group; sequence = atom, {spaces, ',', spaces, atom}, spaces; choice = sequence, {spaces, '|', spaces, sequence}, spaces; rhs = choice; production = identifier, spaces, '=', spaces, rhs, ';', spaces; bnf = production, {production};

This grammar is set up so that the precedence of , and | is correctly implemented: a , b | c will parse as (a, b) | c.

In this syntax for BNF, terminal characters are quoted, i.e. inside '…', a ' is replaced by \' and a \ is replaced by \\ – this is done by the function quote in ppRHS.

If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:

*Main> let bnf1 = runGrammer "expr" parseExpr *Main> let bnf2 = runGrammer "expr" parseBNF *Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF *Main> f bnf1 == bnf1 True *Main> f bnf2 == bnf2 True

The last line is quite meta: We are using parseBNF as a parser on the pretty-printed grammar produced from interpreting parseBNF as a grammar.


We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.

Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative, and not Monad. Applicative is less powerful, so by giving less power to the user of our Descr interface, the other side, i.e. the implementation, can be more powerful.

The reason why Applicative is ok, but Monad is not, is that in Applicative, the results do not affect the shape of the computation, whereas in Monad, the whole point of the bind operator (>>=) is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!

We have also seen that a phantom type, namely the parameter of Grammar, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG ensures that both grammars that are combined here indeed describe something of the same type.

  1. It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using Applicative yesterday.

  2. I like how in this alignment of <*> and <* the > point out where the arguments are that are being passed to the function on the left.

Categories: Offsite Blogs

Brent Yorgey: Adventures in enumerating balanced brackets

Planet Haskell - Mon, 10/24/2016 - 10:42pm

Since I’ve been coaching my school’s ACM ICPC programming team, I’ve been spending a bit of time solving programming contest problems, partly to stay sharp and be able to coach them better, but also just for fun.

I recently solved a problem (using Haskell) that ended up being tougher than I thought, but I learned a lot along the way. Rather than just presenting a solution, I’d like to take you through my thought process, crazy detours and all.

Of course, I should preface this with a big spoiler alert: if you want to try solving the problem yourself, you should stop reading now!

> {-# LANGUAGE GADTs #-} > {-# LANGUAGE DeriveFunctor #-} > > module Brackets where > > import Data.List (sort, genericLength) > import Data.MemoTrie (memo, memo2) > import Prelude hiding ((++)) The problem

There’s a lot of extra verbiage at the official problem description, but what it boils down to is this:

Find the th element of the lexicographically ordered sequence of all balanced bracketings of length .

There is a longer description at the problem page, but hopefully a few examples will suffice. A balanced bracketing is a string consisting solely of parentheses, in which opening and closing parens can be matched up in a one-to-one, properly nested way. For example, there are five balanced bracketings of length :

((())), (()()), (())(), ()(()), ()()()

By lexicographically ordered we just mean that the bracketings should be in “dictionary order” where ( comes before ), that is, bracketing comes before bracketing if and only if in the first position where they differ, has ( and has ). As you can verify, the list of length- bracketings above is, in fact, lexicographically ordered.

A first try

Oh, this is easy, I thought, especially if we consider the well-known isomorphism between balanced bracketings and binary trees. In particular, the empty string corresponds to a leaf, and (L)R (where L and R are themselves balanced bracketings) corresponds to a node with subtrees L and R. So the five balanced bracketings of length correspond to the five binary trees with three nodes:

We can easily generate all the binary trees of a given size with a simple recursive algorithm. If , generate a Leaf; otherwise, decide how many nodes to put on the left and how many on the right, and for each such distribution recursively generate all possible trees on the left and right.

> data Tree where > Leaf :: Tree > Node :: Tree -> Tree -> Tree > deriving (Show, Eq, Ord) > > allTrees :: Int -> [Tree] > allTrees 0 = [Leaf] > allTrees n = > [ Node l r > | k <- [0 .. n-1] > , l <- allTrees ((n-1) - k) > , r <- allTrees k > ]

We generate the trees in “left-biased” order, where we first choose to put all nodes on the left, then on the left and on the right, and so on. Since a subtree on the left will result in another opening paren, but a subtree on the right will result in a closing paren followed by an open paren, it makes intuitive sense that this corresponds to generating bracketings in sorted order. You can see that the size- trees above, generated in left-biased order, indeed have their bracketings sorted.

Writing allTrees is easy enough, but it’s definitely not going to cut it: the problem states that we could have up to . The number of trees with nodes has 598 digits (!!), so we can’t possibly generate the entire list and then index into it. Instead we need a function that can more efficiently generate the tree with a given index, without having to generate all the other trees before it.

So I immediately launched into writing such a function, but it’s tricky to get right. It involves computing Catalan numbers, and cumulative sums of products of Catalan numbers, and divMod, and… I never did get that function working properly.

The first epiphany

But I never should have written that function in the first place! What I should have done first was to do some simple tests just to confirm my intuition that left-biased tree order corresponds to sorted bracketing order. Because if I had, I would have found this:

> brackets :: Tree -> String > brackets Leaf = "" > brackets (Node l r) = mconcat ["(", brackets l, ")", brackets r] > > sorted :: Ord a => [a] -> Bool > sorted xs = xs == sort xs ghci> sorted (map brackets (allTrees 3)) True ghci> sorted (map brackets (allTrees 4)) False

As you can see, my intuition actually led me astray! is a small enough case that left-biased order just happens to be the same as sorted bracketing order, but for this breaks down. Let’s see what goes wrong:

In the top row are the size- trees in “left-biased” order, i.e. the order generated by allTrees. You can see it is nice and symmetric: reflecting the list across a vertical line leaves it unchanged. On the bottom row are the same trees, but sorted lexicographically by their bracketings. You can see that the lists are almost the same except the red tree is in a different place. The issue is the length of the left spine: the red tree has a left spine of three nodes, which means its bracketing will begin with (((, so it should come before any trees with a left spine of length 2, even if they have all their nodes in the left subtree (whereas the red tree has one of its nodes in the right subtree).

My next idea was to try to somehow enumerate trees in order by the length of their left spine. But since I hadn’t even gotten indexing into the original left-biased order to work, it seemed hopeless to get this to work by implementing it directly. I needed some bigger guns.

Building enumerations

At this point I had the good idea to introduce some abstraction. I defined a type of enumerations (a la FEAT or data/enumerate):

> data Enumeration a = Enumeration > { fromNat :: Integer -> a > , size :: Integer > } > deriving Functor > > enumerate :: Enumeration a -> [a] > enumerate (Enumeration f n) = map f [0..n-1]

An Enumeration consists of a size along with a function Integer -> a, which we think of as being defined on [0 .. size-1]. That is, an Enumeration is isomorphic to a finite list of a given length, where instead of explicitly storing the elements, we have a function which can compute the element at a given index on demand. If the enumeration has some nice combinatorial structure, then we expect that this on-demand indexing can be done much more efficiently than simply listing all the elements. The enumerate function simply turns an Enumeration into the corresponding finite list, by mapping the indexing function over all possible indices.

Note that Enumeration has a natural Functor instance, which GHC can automatically derive for us. Namely, if e is an Enumeration, then fmap f e is the Enumeration which first computes the element of e for a given index, and then applies f to it before returning.

Now, let’s define some combinators for building Enumerations. We expect them to have all the nice algebraic flavor of finite lists, aka free monoids.

First, we can create empty or singleton enumerations, or convert any finite list into an enumeration:

> empty :: Enumeration a > empty = Enumeration (const undefined) 0 > > singleton :: a -> Enumeration a > singleton a = Enumeration (\_ -> a) 1 > > list :: [a] -> Enumeration a > list as = Enumeration (\n -> as !! fromIntegral n) (genericLength as) ghci> enumerate (empty :: Enumeration Int) [] ghci> enumerate (singleton 3) [3] ghci> enumerate (list [4,6,7]) [4,6,7]

We can form the concatenation of two enumerations. The indexing function compares the given index against the size of the first enumeration, and then indexes into the first or second enumeration appropriately. For convenience we can also define union, which is just an iterated version of (++).

> (++) :: Enumeration a -> Enumeration a -> Enumeration a > e1 ++ e2 = Enumeration > (\n -> if n < size e1 then fromNat e1 n else fromNat e2 (n - size e1)) > (size e1 + size e2) > > union :: [Enumeration a] -> Enumeration a > union = foldr (++) empty ghci> enumerate (list [3, 5, 6] ++ empty ++ singleton 8) [3,5,6,8]

Finally, we can form a Cartesian product: e1 >< e2 is the enumeration of all possible pairs of elements from e1 and e2, ordered so that all the pairs formed from the first element of e1 come first, followed by all the pairs with the second element of e1, and so on. The indexing function divides the given index by the size of e2, and uses the quotient to index into e1, and the remainder to index into e2.

> (><) :: Enumeration a -> Enumeration b -> Enumeration (a,b) > e1 >< e2 = Enumeration > (\n -> let (l,r) = n `divMod` size e2 in (fromNat e1 l, fromNat e2 r)) > (size e1 * size e2) ghci> enumerate (list [1,2,3] >< list [10,20]) [(1,10),(1,20),(2,10),(2,20),(3,10),(3,20)] ghci> let big = list [0..999] >< list [0..999] >< list [0..999] >< list [0..999] ghci> fromNat big 2973428654 (((2,973),428),654)

Notice in particular how the fourfold product of list [0..999] has elements, but indexing into it with fromNat is basically instantaneous.

Since Enumerations are isomorphic to finite lists, we expect them to have Applicative and Monad instances, too. First, the Applicative instance is fairly straightforward:

> instance Applicative Enumeration where > pure = singleton > f <*> x = uncurry ($) <$> (f >< x) ghci> enumerate $ (*) <$> list [1,2,3] <*> list [10, 100] [10,100,20,200,30,300]

pure creates a singleton enumeration, and applying an enumeration of functions to an enumeration of arguments works by taking a Cartesian product and then applying each pair.

The Monad instance works by substitution: in e >>= k, the continuation k is applied to each element of the enumeration e, and the resulting enumerations are unioned together in order.

> instance Monad Enumeration where > return = pure > e >>= f = union (map f (enumerate e)) ghci> enumerate $ list [1,2,3] >>= \i -> list (replicate i i) [1,2,2,3,3,3]

Having to actually enumerate the elements of e is a bit unsatisfying, but there is really no way around it: we otherwise have no way to know how big the resulting enumerations are going to be.

Now, that function I tried (and failed) to write before that generates the tree at a particular index in left-biased order? Using these enumeration combinators, it’s a piece of cake. Basically, since we built up combinators that mirror those available for lists, it’s just as easy to write this indexing version as it is to write the original allTrees function (which I’ve copied below for comparison):

allTrees :: Int -> [Tree] allTrees 0 = [Leaf] allTrees n = [ Node l r | k <- [0 .. n-1] , l <- allTrees ((n-1) - k) , r <- allTrees k ] > enumTrees :: Int -> Enumeration Tree > enumTrees 0 = singleton Leaf > enumTrees n = union > [ Node <$> enumTrees (n-k-1) <*> enumTrees k > | k <- [0 .. n-1] > ]

(enumTrees and allTrees look a bit different, but actually allTrees can be rewritten in a very similar style:

allTrees :: Int -> [Tree] allTrees 0 = [Leaf] allTrees n = concat [ Node <$> allTrees ((n-1) - k) <*> r <- allTrees k | k <- [0 .. n-1] ]

Doing as much as possible using the Applicative interface gives us added “parallelism”, which in this case means the ability to index directly into a product with divMod, rather than scanning through the results of calling a function on enumerate until we have accumulated the right size. See the paper on the GHC ApplicativeDo extension.)

Let’s try it out:

ghci> enumerate (enumTrees 3) [Node (Node (Node Leaf Leaf) Leaf) Leaf,Node (Node Leaf (Node Leaf Leaf)) Leaf,Node (Node Leaf Leaf) (Node Leaf Leaf),Node Leaf (Node (Node Leaf Leaf) Leaf),Node Leaf (Node Leaf (Node Leaf Leaf))] ghci> enumerate (enumTrees 3) == allTrees 3 True ghci> enumerate (enumTrees 7) == allTrees 7 True ghci> brackets $ fromNat (enumTrees 7) 43 "((((()())))())"

It seems to work! Though actually, if we try larger values of , enumTrees just seems to hang. The problem is that it ends up making many redundant recursive calls. Well… nothing a bit of memoization can’t fix! (Here I’m using Conal Elliott’s nice MemoTrie package.)

> enumTreesMemo :: Int -> Enumeration Tree > enumTreesMemo = memo enumTreesMemo' > where > enumTreesMemo' 0 = singleton Leaf > enumTreesMemo' n = union > [ Node <$> enumTreesMemo (n-k-1) <*> enumTreesMemo k > | k <- [0 .. n-1] > ] ghci> size (enumTreesMemo 10) 16796 ghci> size (enumTreesMemo 100) 896519947090131496687170070074100632420837521538745909320 ghci> size (enumTreesMemo 1000) 2046105521468021692642519982997827217179245642339057975844538099572176010191891863964968026156453752449015750569428595097318163634370154637380666882886375203359653243390929717431080443509007504772912973142253209352126946839844796747697638537600100637918819326569730982083021538057087711176285777909275869648636874856805956580057673173655666887003493944650164153396910927037406301799052584663611016897272893305532116292143271037140718751625839812072682464343153792956281748582435751481498598087586998603921577523657477775758899987954012641033870640665444651660246024318184109046864244732001962029120 ghci> brackets $ fromNat (enumTreesMemo 1000) 8234587623904872309875907638475639485792863458726398487590287348957628934765 "((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((()(((()((((()))())(()()()))()(())(())((()((()))(((())()(((((()(((()()))(((()((((()()(())()())(((()))))(((()()()(()()))))(((()((()))(((()())())))())(()()(())(())()(()())))()))((()()))()))()))()(((()))(()))))))())()()()))((())((()))((((())(())))((())))))()))()(())))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))"

That’s better!

A second try

At this point, I thought that I needed to enumerate trees in order by the length of their left spine. Given a tree with a left spine of length , we enumerate all the ways to partition the remaining elements among the right children of the spine nodes, preferring to first put elements as far to the left as possible. As you’ll see, this turns out to be wrong, but it’s fun to see how easy it is to write this using the enumeration framework.

First, we need an enumeration of the partitions of a given into exactly parts, in lexicographic order.

> kPartitions :: Int -> Int -> Enumeration [Int]

There is exactly one way to partition into zero parts.

> kPartitions 0 0 = singleton []

We can’t partition anything other than into zero parts.

> kPartitions _ 0 = empty

Otherwise, pick a number from down to to go in the first spot, and then recursively enumerate partitions of into exactly parts.

> kPartitions n k = do > i <- list [n, n-1 .. 0] > (i:) <$> kPartitions (n-i) (k-1)

Let’s try it:

ghci> let p43 = enumerate $ kPartitions 4 3 ghci> p43 [[4,0,0],[3,1,0],[3,0,1],[2,2,0],[2,1,1],[2,0,2],[1,3,0],[1,2,1],[1,1,2],[1,0,3],[0,4,0],[0,3,1],[0,2,2],[0,1,3],[0,0,4]] ghci> all ((==3) . length) p43 True ghci> all ((==4) . sum) p43 True ghci> sorted (reverse p43) True

Now we can use kPartitions to build our enumeration of trees:

> spinyTrees :: Int -> Enumeration Tree > spinyTrees = memo spinyTrees' > where > spinyTrees' 0 = singleton Leaf > spinyTrees' n = do > > -- Pick the length of the left spine > spineLen <- list [n, n-1 .. 1] > > -- Partition the remaining elements among the spine nodes > bushSizes <- kPartitions (n - spineLen) spineLen > bushes <- traverse spinyTrees bushSizes > return $ buildSpine (reverse bushes) > > buildSpine :: [Tree] -> Tree > buildSpine [] = Leaf > buildSpine (b:bs) = Node (buildSpine bs) b

This appears to give us something reasonable:

ghci> size (spinyTrees 7) == size (enumTreesMemo 7) True

But it’s pretty slow—which is to be expected with all those monadic operations required. And there’s more:

ghci> sorted . map brackets . enumerate $ spinyTrees 3 True ghci> sorted . map brackets . enumerate $ spinyTrees 4 True ghci> sorted . map brackets . enumerate $ spinyTrees 5 False

Foiled again! All we did was stave off failure a bit, until . I won’t draw all the trees of size for you, but the failure mode is pretty similar: picking subtrees for the spine based just on how many elements they have doesn’t work, because there are cases where we want to first shift some elements to a later subtree, keeping the left spine of a subtree, before moving the elements back and having a shorter left spine.

The solution: just forget about trees, already

It finally occurred to me that there was nothing in the problem statement that said anything about trees. That was just something my overexcited combinatorial brain imposed on it: obviously, since there is a bijection between balanced bracketings and binary trees, we should think about binary trees, right? …well, there is also a bijection between balanced bracketings and permutations avoiding (231), and lattice paths that stay above the main diagonal, and hundreds of other things, so… not necessarily.

In this case, I think trees just end up making things harder. Let’s think instead about enumerating balanced bracket sequences directly. To do it recursively, we need to know how to enumerate possible endings to the start of any balanced bracket sequence. That is, we need to enumerate sequences containing opening brackets and extra closing brackets (so closing brackets in total), which can be appended to a sequence of brackets with more opening brackets than closing brackets.

Given this idea, the code is fairly straightforward:

> enumBrackets :: Int -> Enumeration String > enumBrackets n = enumBracketsTail n 0 > > enumBracketsTail :: Int -> Int -> Enumeration String > enumBracketsTail = memo2 enumBracketsTail' > where

To enumerate a sequence with no opening brackets, just generate c closing brackets.

> enumBracketsTail' 0 c = singleton (replicate c ')')

To enumerate balanced sequences with opening brackets and an exactly matching number of closing brackets, start by generating an opening bracket and then continue by generating sequences with opening brackets and one extra closing bracket to match the opening bracket we started with.

> enumBracketsTail' n 0 = ('(':) <$> enumBracketsTail (n-1) 1

In general, a sequence with opening and extra closing brackets is either an opening bracket followed by an (n-1, c+1)-sequence, or a closing bracket followed by an (n, c-1)-sequence.

> enumBracketsTail' n c = > (('(':) <$> enumBracketsTail (n-1) (c+1)) > ++ > ((')':) <$> enumBracketsTail n (c-1))

This is quite fast, and as a quick check, it does indeed seem to give us the same size enumerations as the other tree enumerations:

ghci> fromNat (enumBrackets 40) 16221270422764920820 "((((((((()((())()(()()()())(()))((()()()()(()((()())))((()())))))))()))()())()))" ghci> size (enumBrackets 100) == size (enumTreesMemo 100) True

But, are they sorted? It would seem so!

ghci> all sorted (map (enumerate . enumBrackets) [1..10]) True

At this point, you might notice that this can be easily de-abstracted into a fairly simple dynamic programming solution, using a 2D array to keep track of the size of the enumeration for each (n,c) pair. I’ll leave the details to interested readers.

Categories: Offsite Blogs

Douglas M. Auclair (geophf): September 2016 1HaskellADay 1Liners Problems and Solutions

Planet Haskell - Sat, 10/22/2016 - 8:00am
  • September 15th, 2016:
    Given [1..n], create an infinite list of lists [[1.. n], [n+1 ... n+n], [n+n+1 ... 3n], ...]
    counting :: [Integer] -> [[Integer]]
    • joomy @cattheory
      counting = (map . (+) . fromIntegral . length) >>= iterate
  • September 30th, 2016: The reverse of August's one-liner:
    f :: (Maybe a, b) -> Maybe (a,b)
    define f. Snaps for elegance.
Categories: Offsite Blogs

Edwin Brady: State Machines All The Way Down

Planet Haskell - Fri, 10/21/2016 - 5:48pm

A new draft paper, State Machines All The Way Down, which describes an architecture for dependently typed functional programs. Abstract:

A useful pattern in dependently typed programming is to define a state transition system, for example the states and operations in a network protocol, as a parameterised monad. We index each operation by its input and output states, thus guaranteeing that operations satisfy pre- and post-conditions, by typechecking. However, what if we want to write a program using several systems at once? What if we want to define a high level state transition system, such as a network application protocol, in terms of lower level states, such as network sockets and mutable variables? In this paper, I present an architecture for dependently typed applications based on a hierarchy of state transition systems, implemented as a library called states. Using states, I show: how to implement a state transition system as a dependent type, with type level guarantees on its operations; how to account for operations which could fail; how to combine state transition systems into a larger system; and, how to implement larger systems as a hierarchy of state transition systems. As an example, I implement a simple high level network application protocol.

Comments welcome! You can get the draft here.

Categories: Offsite Blogs

Roman Cheplyaka: Mean-variance ceiling

Planet Haskell - Thu, 10/20/2016 - 2:00pm

Today I was playing with the count data from a small RNA-Seq experiment performed in Arabidopsis thaliana.

At some point, I decided to look at the mean-variance relationship for the fragment counts. As I said, the dataset is small; there are only 3 replicates per condition from which to estimate the variance. Moreover, each sample is from a different batch. I wasn’t expecting to see much.

But there was a pattern in the mean-variance plot that was impossible to miss.

<figure> <figcaption>Mean-variance plot of counts per million, log-log scale</figcaption> </figure>

It is a nice straight line that many points lie on, but none dare to cross. A ceiling.

The ceiling looked mysterious at first, but then I found a simple explanation. The sample variance of \(n\) numbers \(a_1,\ldots,a_n\) can be written as

\[\sigma^2=\frac{n}{n-1}\left(\frac1n\sum_{i=1}^n a_i^2-\mu^2\right),\]

where \(\mu\) is the sample mean. Thus,

\[\frac{\sigma^2}{\mu^2}=\frac{\sum a_i^2}{(n-1)\mu^2}-\frac{n}{n-1}.\]

For non-negative numbers, \(n^2\mu^2=(\sum a_i)^2\geq \sum a_i^2\), and


This means that on a log-log plot, all points \((\mu,\sigma^2)\) lie on or below the line \(y=2x+\log n\).

Moreover, the points that lie exactly on the line correspond to the samples where all \(a_i\) but one are zero. In other words, those are gene-condition combinations where the gene’s transcripts were registered in a single replicate for that condition.

Categories: Offsite Blogs

Roman Cheplyaka: The rule of 17 in volleyball

Planet Haskell - Wed, 10/19/2016 - 2:00pm

Scott Adams, the author of Dilbert, writes in his book “How to Fail at Almost Everything and Still Win Big”:

Recently I noticed that the high-school volleyball games I attended in my role as stepdad were almost always won by the team that reached seventeen first, even though the winning score is twenty-five and you have to win by two.

It’s common for the lead to change often during a volleyball match, and the team that first reaches seventeen might fall behind a few more times before winning, which makes the pattern extra strange.

Good observation, Scott! But why could it be so?

Scott offers two possible explanations. One is psychological: the leading team has a higher morale while the losing team feels defeated. The other is that perhaps the coach of the losing team sees this as an opportunity to let his bench players on the court.

While these reasons sound plausible to me, there is a simpler logical explanation. It would hold even if the players and coaches were robots.

Imagine that you enter a gym where a game is being played. You see the current score: 15:17. If you know nothing else about the teams except their current score, which one do you think is more likely to win the set?

There are two reasons to think it is the leading team:

  1. The score by itself doesn’t offer much evidence that the leading team is stronger or in a better shape. However, if one of the teams is stronger, it is more likely to be the leading team.
  2. Even without assuming anything about how good the teams are, the leading team at this moment is up for an easier task: it needs only 8 points to win, whereas the team behind needs 10 points.

To quantify the reliability of Scott Adams’s “rule of 17”, I wrote a simple simulation in R: <- function(prob, threshold) { score <- c(0,0) leader <- NA serving <- 1 while (all(score < 25) || abs(diff(score)) < 2) { winner <- if (as.logical(rbinom(1,1,prob[[serving]]))) serving else 3 - serving score[[winner]] <- score[[winner]] + 1 serving <- winner if ( && any(score == threshold)) { leader <- which.max(score) } } return(c(leader, which.max(score))) }

Here prob is a 2-dimensional vector \((p_1,p_2)\), where \(p_i\) is the probability of team \(i\) to win their serve against the opposing team. The function simulates a single set and returns two numbers: which team first scored threshold (e.g. 17) points and which team eventually won. If the two numbers are equal, the rule worked in this game.

Then I simulated a game 1000 times for each of many combinations of \(p_1\) and \(p_2\) and calculated the fraction of the games where the rule worked. Here’s the result:

<figure> </figure>

When \(p_1=p_2\), the reliability of the rule is independent of the values of \(p_1\) and \(p_2\) (within the tested limits of \(0.3\) and \(0.7\)) and equals approximately \(81\%\). This is entirely due to reason 2: all else being equal, the leading team has a head start.

When teams are unequal, reason 1 kicks in, and for large inequalities, the reliability of the rule approaches \(1\). For instance, when \(p_1=0.3\) and \(p_2=0.7\), the rule works about \(99\%\) of the time.

Is there anything magical about the number 17? No, we would expect the rule to work for any threshold at least to some extent. The reliability would grow from somewhere around \(50\%\) for the threshold of \(1\) to almost \(100\%\) for the threshold of \(25\).

And indeed, this is what we observe (for \(p_1=p_2\)):

<figure> </figure>

This reminds me of men’s gold medal match at the 2012 London Olympics, where Russia played against Brazil. Russia loses the first two sets. A game lasts until one of the teams wins 3 sets in total, so Russia cannot afford to lose a single set now. In the third set, Brazil continues to lead, reaching 17 (and then 18) points while Russia has 15. Several minutes later, Brazil leads 22:19.

And then, against all odds, the Russian team wins that set 29:27, then the two following sets, and gets the gold.

<figure> <figcaption>Dmitriy Muserskiy is about to score the gold medal point</figcaption> </figure>
Categories: Offsite Blogs

Philip Wadler: Papers We Love Remote Meetup: John Reynolds, Definitional Interpreters for Higher-Order Languages

Planet Haskell - Tue, 10/18/2016 - 8:34am

I will reprise my June presentation to Papers We Love London at Papers We Love Remote Meetup 2, today at 7pm UK time, with the subject John Reynolds, Definitional Interpreters for Higher-Order Languages. Learn the origins of denotational semantics and continuations. Additional citations here. See you there!

Categories: Offsite Blogs

Ken T Takusagawa: [uitadwod] Stackage

Planet Haskell - Sun, 10/16/2016 - 11:33pm

Stackage for Haskell packages has the curious behavior that packages can disappear from it even though they were perfectly fine.  The cause of such a disappearance of say a package B is as follows: package B was originally pulled in as a dependency of another package A, and the maintainer of package A quit, so package A and all its dependencies, including package B, are candidates to be removed from Stackage.  Package B survives only if it has a direct maintainer in Stackage or is a dependency of another maintained package.

Inspired by the many packages that got dropped when lambdabot got removed from Stackage nightly, e.g., brainfuck.

Although the stated goal of Stackage is a curated collection of Haskell packages, each with an explicit maintainer willing to fix bugs and compilation problems (e.g., with new versions of GHC), I have found that a side feature is more useful: the identification of a large mutually compatible collection of packages without version dependency problems.  Such a side feature -- such a collection -- could be computed automatically without having to have a direct or indirect maintainer for each package in the collection.  I wish such a larger collection existed.

Start with, say, Stackage Nightly and expand it to include every package in Hackage that compiles cleanly and is compatible with Stackage Nightly and with every other package in the expanded collection.  There may be tricky cases of mutually incompatible packages in a potential expanded set which will need to be resolved, e.g., the newest version of A requires an old version of B, and the newest version of B requires an old version of A.  Perhaps resolve such conflicts in favor of the choice which causes the expanded set to be as large as possible.

Tangentially, how can one safely build a package (to test whether it compiles cleanly) if one is not sure whether a package's build script is evil?  Probably some kind of operating system container or sandbox.  Identify packages which use simple, presumably safe, build mechanisms, probably pure Haskell, versus packages which do something unusual, e.g., call a Makefile, which ought to be scrutinized before building.  (Inspired by a build script of software, I think maxima computer algebra, which creepily attempted to send email back to the author every time it was compiled.)

Can compiling a carefully crafted source file with GHC allow the author of the source file to perform arbitrary user-level actions within the operating system?

Categories: Offsite Blogs

Tweag I/O: A new ecosystem for Haskell: the JVM

Planet Haskell - Sun, 10/16/2016 - 6:00pm
Mathieu Boespflug, Alp Mestanogullari   |   17 October 2016

By now, Haskell has first class support for seamlessly embedding foreign code into source files and casually call anything in C (via inline-c) or R (via inline-r), let alone that whole programs can also be compiled down to in-browser JavaScript, thanks to GHCJS. Today the interoperability story for Haskell is getting better still: we’re announcing the addition of a new set of languages into the mix. With jvm, you can call any method known to the JVM from Haskell. With inline-java, you can moreover call these methods in Java syntax, embedded in your source files. Not that this was particularly our intention - promise! inline-java and friends just fell out naturally from our work on sparkle

To give you a taste of what it’s like to program this way, here’s an obligatory “Hello World!” in Haskell, but with a twist: we call the Swing GUI toolkit to display our message to the world in a graphical dialog box.

A Swing GUI application in Haskell

{-# LANGUAGE DataKinds #-} {-# LANGUAGE QuasiQuotes #-} {-# LANGUAGE ScopedTypeVariables #-} module Main where import Data.Int import Language.Java import Language.Java.Inline main :: IO Int32 main = withJVM [] $ do message <- reflect "Hello World!" [java| { javax.swing.JOptionPane.showMessageDialog(null, $message); return 0; } |]

In short, it’s now possible to write Java programs that call into Haskell with near trivial overhead, as we demonstrated previously with sparkle, or indeed Haskell programs that call into any of the hundreds of thousands of JVM packages available publicly and boundless custom enterprise developments.

How it works

The key enabler to talking to Java and all the other JVM languages from Haskell is that Haskell speaks C and the JVM speaks C. That is, both Haskell and the JVM make it possible to call C functions and have C functions call them. In both cases, this is done as part of their respective foreign function interfaces (FFI). So we have a lingua franca for both languages: to move from Haskell to Java or vice versa, go through C first. Each language has their own custom calling convention anyways, so some small amount of glue code to mediate between the two is inevitable.

In fact, in the case of the JVM, bytecode is compiled “just-in-time” or perhaps even not at all. Fortunately, that’s not something we have to worry about: the JVM’s standard interface from C, called the Java Native Interface (JNI), encapsulates all the nitty-gritty detail of invoking methods behind a simple interface. As a first step, we wrote near complete bindings to all of the JNI, using inline-c under the hood for better safety.

Calling into the JVM, the hard way

We could just expose the raw JNI API in Haskell and call it a day. Using raw JNI calls to invoke say a static Java method called foo, which takes an int and a boolean and returns some object, goes something like this:

import Foreign.JNI callFoo = do klass <- findClass "some/Java/Class" -- JNI uses '/' instead of '.'... method <- getStaticMethodID klass "foo" "(IZ)Ljava/lang/Object;" callStaticObjectMethod klass method [JInt 0, JBoolean 1]

Because the JVM allows overloaded method names, when grabbing a handle to invoke a method, you’ll need to specify a type signature to disambiguate which method you really want to call. But the JNI was purposefully designed independently of Java’s syntax, to the point where even class names are written differently. The JNI syntax for type signatures is optimized for speed of parsing and compactness, not legibility. So constructing these type signatures by hand to invoke JVM methods via raw JNI calls is rather error prone. That’s why we wrote the jvm package, a toolkit for invoking JVM methods more conveniently and robustly.

Using Haskell types for safer JVM calls

There are two downsides to the raw JNI calls we saw above:

  • performance: getting class and method handles is expensive. Ideally, we’d only ever lookup classes and methods by name at most once throughout the lifetime of the program, assuming that loaded classes exist for all time and are never redefined.
  • stringly typing: we pass signatures explicitly, but these are literally strings, typos and all. If you mistype the signature, no compiler will call that out. Ideally ill-formed signatures would be caught at compile-time, rather than at runtime when it’s far too late and your program will simply crash.

The performance issue is easily dispensed with. The trick is to write wrappers that tell Haskell that findClass and getStaticMethodID are really pure, in the sense that calling either of them multiple times and in any context always yields equivalent results. So we could in principle ascribe pure types to them. The argument goes something like the following. Compare the following snippet with the one above:

callFoo = do let pureStuff@(klass, method) = unsafePerformIO $ do (,) <$> findClass "some/Java/Class" <*> getStaticMethodID klass "foo" "(IZ)Ljava/lang/Object;" callStaticObjectMethod klass method [JInt 0, JBoolean 1]

The expression for pureStuff is a closed expression (no free variables occur). And because its type is not IO, the compiler is free to float it to top-level, effectively turning it into a CAF, which are always evaluated at most once thanks to laziness:

(klass, method) = unsafePerformIO $ do (,) <$> findClass "some/Java/Class" <*> getStaticMethodID klass "foo" "(IZ)Ljava/lang/Object;" callFoo = do callStaticObjectMethod klass method [JInt 0, JBoolean 1]

As for the stringly typing problem, we’ll need some tools first. First, we need to reflect in Haskell enough type information. To that end, we’ll index the type of Java objects by their Java type:

newtype J (a :: JType) = J (Ptr (J a))

Java types can either be primitives (int,boolean, etc) or reference types (classes, arrays, interfaces, generics etc). So our definition of JType goes something like this:

data JType = Prim Symbol | Class Symbol | Array JType | ... genSingletons ['JType]

Thus equipped, we can write types like,

  • the type of Swing option panes, J ('Class "javax.swing.JOptionPane")
  • the type of boxed Java integers, J ('Class "java.lang.Integer"),
  • the type of primitive integer arrays, J ('Array ('Prim "int")),
  • etc.

What’s more, thanks to the family of singleton types and instances created by genSingletons above, we can reflect on the type of any Java object at runtime to get a representation of the type at the value level. This is helpful to auto compute JNI type signatures from the types alone. No more stringly typing will all those typos in tow: JNI type signatures are now correct by construction.

In particular, we can define a family of variants of callStaticObjectMethod:

module Language.Java where callStatic1 :: (SingI ty1, SingI tyr) => Sing (klass :: Symbol) -> JNI.String -> J ty1 -> IO (J tyr) callStatic2 :: (SingI ty1, SingI ty2, SingI tyr) => Sing (klass :: Symbol) -> JNI.String -> J ty1 -> J ty2 -> IO (J tyr) callStatic3 :: (SingI ty1, SingI ty2, SingI ty3, SingI tyr) => Sing (klass :: Symbol) -> JNI.String -> J ty1 -> J ty2 -> J ty3 -> IO (J tyr) ...

The types of these functions are expressive enough to infer a type signature for the called method. Thanks to the type reflection provided by the singletons package, we can reify types as values and produce JNI type signatures from that. Of course, a fixed number of callStatic* functions, one per arity, is rather limiting (what about arbitrary arities?), so in reality the Language.Java module provides a single such function, to whom arguments are passed packed into a homogeneous list:

callStatic :: SingI tyr => Sing (klass :: Symbol) -> JNI.String -> [JValue] -> IO (J tyr)

where JValue is defined as

data JValue = forall a. SingI a => JObject (J a) | JBoolean Word8 | JInt Int32 | ...

In this way, values of primitive type can be passed to Java without penalty: no need to box them into tiny objects first. It turns out we can extend the same idea to obtain unboxed return values, but the technical details get a bit more intricate, so we’ll have to defer that to the module’s documentation.

Calling a non-static method is achieved in much the same way:

call :: (SingI ty, SingI tyr) => J ty -> JNI.String -> [JValue] -> IO (J tyr) JVM calls the Java way

call and callStatic are surprisingly versatile facilities for calling arbitrary JVM methods with an arbitrary number of boxed or unboxed arguments and return values, but sometimes one might still get the types wrong. For example, there’s nothing stopping us from attempting to call a java.lang.Integer constructor with a boolean typed argument. No such constructor exists, so we’ll get a method lookup exception at runtime. After all, we don’t know in Haskell what methods really do exist, and what their signatures are. But if we call the java.lang.Integer constructor using Java syntax, we can hope to get the Java compiler to perform full scope checking and type checking, thus ruling out common errors such as calling non-existent methods are supplying arguments of the wrong type.

To achieve that, we use GHC’s quasiquotation extension. This extension allows us to embed syntax from arbitrary foreign languages in Haskell source files, in between special brackets. Better yet, we are free to extend the foreign syntax to express antiquotation variables, i.e. variables that refer to the enclosing context in Haskell. Take for example our earlier “hello world” code snippet, simplified:

do message <- reflect "Hello World!" [java| javax.swing.JOptionPane.showMessageDialog(null, $message) |]

Using reflect, also provided by inline-java, we create a J "java.lang.String" from a Haskell String. We can then refer to this Java object, bound to a Haskell variable, from inside the Java code snippet. The $ sigil is there to disambiguate between variables bound in the Haskell context (aka antiquotation) and in the Java context.

You might have noticed a difference with inline-c: in inline-java we don’t need to annotate quasiquotations with the return type nor each antiquote with their types, which can get quite verbose. Instead, we just about manage to infer which foreign types are intended based on the types of the Haskell variables. To pull this off required a journey in compiler hacking, ghc-heap-view and a novel use of static pointers. A journey best told next time…

The road ahead

There are plenty of solutions out there for lightweight interop across languages. You can start by swapping JSON messages between separate processes and take it from there. But for a truly universal solution fit for all situations, our experience is that keeping any overheads low or perhaps even nonexistent is the key enabler to seamlessly mixing multiple languages and blithely crossing language boundaries without guilt. In this post we presented a suite of packages for high-speed Java/Haskell interop, which together ensure:

  • box-free foreign calls: because we infer precise JVM types from Haskell types, arguments passed to JVM methods are boxed only if they need to be. Small values of primitive type can be passed to/from the JVM with no allocation at all on the heap.
  • marshalling-free argument passing: Java objects can be manipulated as easily from Haskell as from Java. This means that you can stick to representing all your data as Java objects if you find yourself calling into Java very frequently, hence avoiding any marshalling costs when transferring control to/from the JVM.
  • type safe Java calls: when calls are made in Java syntax, this syntax is supplied to an embedded instance of javac at compile-time for scope checking and type checking. That way we have a static guarantee that the types on the Haskell side match up with the types on the Java side, without having to resort to FFI stub generators and preprocessors.

We were fortunate enough to be able to stand on excellent libraries to get here. Take parsing of Java syntax: that came straight from Niklas Broberg and Vincent Hanquez’s venerable language-java library.

What we haven’t addressed yet with inline-java is the perennial issue when interoperating two garbage collected languages of automatic memory management. Since we have two heaps (the GHC heap and the JVM heap), with two garbage collectors, neither of which able to traverse objects in the other heap, we are forced to pin in memory objects shared across the language boundary. In the case of JVM objects, the JNI does this for us implicitly, provided object references are kept thread-local. It would be nice if we could make these object references safe across threads and get both garbage collectors to agree to dispose of them safely when dead. You can get a fair amount of mileage the way things are: we managed to run topic analysis on all of Wikipedia concurrently on 16 machines and hours of machine time without tinkering with object lifetimes and GC’s.

So plenty more to do still! Make sure to check out the project’s GitHub repository to follow progress and contribute.

Copyright 2015-2016 Tweag I/O.
Categories: Offsite Blogs

Dan Piponi (sigfpe): Expectation-Maximization with Less Arbitrariness

Planet Haskell - Sun, 10/16/2016 - 5:04pm

There are many introductions to the Expectation-Maximisation algorithm. Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic. They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems. Examples of tricks I've seen used are:

  1. Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to this way of writing this expression and not that one which is equal?
  2. Substituting in the middle of an expression. Again, you can use just about anywhere. Why choose this at this time? Similarly I found derivations that insert a into an expression.
  3. Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.
My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary. There's nothing original here, I'm merely expanding on a stackexchange answer.

Generalities about EM

The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe.

First assume no hidden variables. We assume there is a vector of parameters that defines some model. We make some observations . We have a probability density that depends on . The likelihood of given the observations is . The maximum likelhood estimator for is the choice of that maximises for the we have observed.

Now suppose there are also some variables that we didn't get to observe. We assume a density . We now have

where we sum over all possible values of . The MLE approach says we now need to maximise One of the things that is a challenge here is that the components of might be mixed up among the terms in the sum. If, instead, each term only referred to its own unique block of , then the maximisation would be easier as we could maximise each term independently of the others. Here's how we might move in that direction. Consider instead the log-likelihood Now imagine that by magic we could commute the logarithm with the sum. We'd need to maximise One reason this would be to our advantage is that often takes the form where is a simple function to optimise. In addition, may break up as a sum of terms, each with its own block of 's. Moving the logarithm inside the sum would give us something we could easily maximise term by term. What's more, the for each is often a standard probability distribution whose likelihood we already know how to maximise. But, of course, we can't just move that logarithm in.

Maximisation by proxy

Sometimes a function is too hard to optimise directly. But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead. That will give us a new guess and we can continue from there. This is the basis of gradient descent. Suppose is a differentiable function in a neighbourhood of . Then around we have

We can try optimising with respect to within a neighbourhood of . If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent. (Note that picking a circular neighbourhood is itself a somewhat arbitrary step, but that's another story.) For gradient descent we're choosing because it matches both the value and derivatives of at . We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration.

We want our logarithm of a sum to be a sum of logarithms. But instead we'll settle for a proxy function that is a sum of logarithms. We'll make the derivatives of the proxy match those of the original function precisely so we're not making an arbitrary choice.


The are constants we'll determine. We want to match the derivatives on either side of the at : On the other hand we have

To achieve equality we want to make these expressions match. We choose

Our desired proxy function is:

So the procedure is to take an estimated and obtain a new estimate by optimising this proxy function with respect to . This is the standard EM algorithm.

It turns out that this proxy has some other useful properties. For example, because of the concavity of the logarithm, the proxy is always smaller than the original likelihood. This means that when we optimise it we never optimise ``too far'' and that progress optimising the proxy is always progress optimising the original likelihood. But I don't need to say anything about this as it's all part of the standard literature.


As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute

you can iterate, at each step computing where is the previous iteration. If the take a convenient form then this may turn out to be much easier.


This was originally written as a PDF using LaTeX. It'll be available here for a while. Some fidelity was lost when converting it to HTML.

Categories: Offsite Blogs