News aggregator

Brandon Simmons: Announcing: unagi-bloomfilter

Planet Haskell - Thu, 08/25/2016 - 8:47am

I just released a new Haskell library called unagi-bloomfilter that is up now on hackage. You can install it with:

$ cabal install unagi-bloomfilter

The library uses the bloom-1 variant from “Fast Bloom Filters and Their Generalization” by Yan Qiao, et al. I’ll try to write more about it when I have the time. Also I just gave a talk on things I learned working on the project last night at the New York Haskell User Group:

It was quite rough, but I was happy to hear from folks that found some interesting things to take away from it.

Thanks to Gershom for inviting me to speak, for my company Signal Vine for sponsoring my trip out, and to Yan Qiao for generously answering my silly questions and helping me understand the paper.

P.S. We’re hiring haskell developers

Signal Vine is an awesome group of people, with interesting technology and problems to solve, and we’re looking to grow the small development team. If you have some experience with haskell (you don’t have to be a guru) and are interested, please reach out to Jason or me at:
Categories: Offsite Blogs

Michael Snoyman: Restarting this blog

Planet Haskell - Tue, 08/23/2016 - 6:00pm

Just a minor note: I'm planning on starting up this blog again, with some personal thoughts - likely still mostly around programming and Haskell - that don't fit in the other blogs that I contribute to (Yesod Web Framework and FP Complete).

I don't have a clear list of topics I'm going to be covering, but I'll likely be sharing some thoughts on running engineering teams and startups effectively. If you have something you'd like me to cover, please Tweet it to me.

Categories: Offsite Blogs

Roman Cheplyaka: Extract the first n sequences from a FASTA file

Planet Haskell - Tue, 08/23/2016 - 2:00pm

A FASTA file consists of a series of biological sequences (DNA, RNA, or protein). It looks like this:


There probably exist dozens of python scripts to extract the first \(n\) sequences from a FASTA file. Here I will show an awk one-liner that performs this task, and explain how it works.

Here it is (assuming the number of sequences is stored in the environment variable NSEQS):

awk "/^>/ {n++} n>$NSEQS {exit} {print}"

This one-liner can read from standard input (e.g. as part of a pipe), or you can append one or more file names to the end of the command, e.g.

awk "/^>/ {n++} n>$NSEQS {exit} {print}" file.fasta

An awk script consists of one or more statements of the form pattern { actions }. The input is read line-by-line, and if the current line matches the pattern, the corresponding actions are executed.

Our script consists of 3 statements:

  1. /^>/ {n++} increments the counter each time a new sequence is started. /.../ denotes a regular expression pattern, and ^> is a regular expression that matches the > sign at the beginning of a line.

    An uninitialized variable in awk has the value 0, which is exactly what we want here. If we needed some other initial value (say, 1), we could have added a BEGIN pattern like this: BEGIN {n=1}.
  2. n>$NSEQS {exit} aborts processing once the counter reaches the desired number of sequences.
  3. {print} is an action without a pattern (and thus matching every line), which prints every line of the input until the script is aborted by exit.

A shorter and more cryptic way to write the same is

awk "/^>/ {n++} n>$NSEQS {exit} 1"

Here I replaced the action-without-pattern by a pattern-without-action. The pattern 1 (meaning “true”) matches every line, and when the action is omitted, it is assumed to be {print}.

Categories: Offsite Blogs

mightybyte: Measuring Software Fragility

Planet Haskell - Mon, 08/22/2016 - 9:41am
<style> .hl { background-color: orange; } </style>

While writing this comment on reddit I came up with an interesting question that I think might be a useful way of thinking about programming languages. What percentage of single non-whitespace characters in your source code could be changed to a different character such that the change would pass your CI build system but would result in a runtime bug? Let's call this the software fragility number because I think that metric gives a potentially useful measure of how bug prone your software is.

At the end of the day software is a mountain of bytes and you're trying to get them into a particular configuration. Whether you're writing a new app from scratch, fixing bugs, or adding new features, the number of bytes of source code you have (similar to LOC, SLOC, or maybe the compressed number of bytes) is rough indication of the complexity of your project. If we model programmer actions as random byte mutations over all of a project's source and we're trying to predict the project's defect rate this software fragility number is exactly the thing we need to know.

Now I'm sure many people will be quick to point out that this random mutation model is not accurate. Of course that's true. But I would argue that in this way it's similar to the efficient markets hypothesis in finance. Real world markets are obviously not efficient (Google didn't become $26 billion less valuable because the UK voted for brexit). But the efficient markets model is still really useful--and good luck finding a better one that everybody will agree on.

What this model lacks in real world fidelity, it makes up for in practicality. We can actually build an automated system to calculate a reasonable approximation of the fragility number. All that has to be done is take a project, randomly mutate a character, run the project's whole CI build, and see if the result fails the build. Repeat this for every non-whitespace character in the project and count how many characters pass the build. Since the character was generated at random, I think it's reasonable to assume that any mutation that passes the build is almost definitely a bug.

Performing this process for every character in a large project would obviously require a lot of CPU time. We could make this more tractable by picking characters at random to mutate. Repeat this until you have done it for a large enough number of characters and then see what percentage of them made it through the build. Alternatively, instead of choosing random characters you could choose whole modules at random to get more uniform coverage over different parts of the language's grammar. There are probably a number of different algorithms that could be tried for picking random subsets of characters to test. Similar to numerical approximation algorithms such as Newton's method, any of these algorithms could track the convergence of the estimate and stop when the value gets to a sufficient level of stability.

Now let's investigate actual fragility numbers for some simple bits of example code to see how this notion behaves. First let's look at some JavaScript examples.

It's worth noting that comment characters should not be allowed to be chosen for mutation since they obviously don't affect the correctness of the program. So the comments you see here have not been included in the calculations. Fragile characters are highlighted in orange.

// Fragility 12 / 48 = 0.25 function f(n) { if ( n < 2 ) return 1; else return n * f(n-1); } // Fragility 14 / 56 = 0.25 function g(n) { var p = 1; for (var i = 2; i <= n; i++ ) { p *= i; } return p; }

First I should say that I didn't write an actual program to calculate these. I just eyeballed it and thought about what things would fail. I easily could have made mistakes here. In some cases it may even be subjective, so I'm open to corrections or different views.

Since JavaScript is not statically typed, every character of every identifier is fragile--mutating them will not cause a build error because there isn't much of a build. JavaScript won't complain, you'll just start getting undefined values. If you've done a signifciant amount of JavaScript development, you've almost definitely encountered bugs from mistyped identifier names like this. I think it's mildly interesting that the recursive and iterative formulations if this function both have the same fragility. I expected them to be different. But maybe that's just luck.

Numerical constants as well as comparison and arithmetic operators will also cause runtime bugs. These, however, are more debatable because if you use the random procedure I outlined above, you'll probably get a build failure because the character would have probably changed to something syntactically incorrect. In my experience, it semes like when you mistype an alpha character, it's likely that the wrong character will also be an alpha character. The same seems to be true for the classes of numeric characters as well as symbols. The method I'm proposing is that the random mutation should preserve the character class. Alpha characters should remain alpha, numeric should remain numeric, and symbols should remain symbols. In fact, my original intuition goes even further than that by only replacing comparison operators with other comparison operators--you want to maximize the chance that new mutated character will cause a successful build so the metric will give you a worst-case estimate of fragility. There's certainly room for research into what patterns tend come up in the real world and other algorithms that might describe that better.

Now let's go to the other end of the programming language spectrum and see what the fragility number might look like for Haskell.

// Fragility 7 / 38 = 0.18 f :: Int -> Int f n | n < 2 = 1 | otherwise = n * f (n-1)

Haskell's much more substantial compile time checks mean that mutations to identifier names can't cause bugs in this example. The fragile characters here are clearly essential parts of the algorithm we're implementing. Maybe we could relate this idea to information theory and think of it as an idea of how much information is contained in the algorithm.

One interesting thing to note here is the effect of the length of identifier names on the fragility number. In JavaScript, long identifier names will increase the fragility because all identifier characters can be mutated and will cause a bug. But in Haskell, since identifier characters are not fragile, longer names will lower the fragility score. Choosing to use single character identifier names everywhere makes these Haskell fragility numbers the worst case and makes JavaScript fragility numbers the best case.

Another point is that since I've used single letter identifier names it is possible for a random identifier mutation in Haskell to not cause a build failure but still cause a bug. Take for instance a function that takes two Int parameters x and y. If y was mutated to x, the program would still compile, but it would cause a bug. My set of highlighted fragile characters above does not take this into account because it's trivially avoidable by using longer identifier names. Maybe this is an argument against one letter identifier names, something that Haskell gets criticism for.

Here's the snippet of Haskell code I was talking about in the above reddit comment that got me thinking about all this in the first place:

// Fragility 31 / 277 = 0.11 data MetadataInfo = MetadataInfo { title :: Text , description :: Text } pageMetadataWidget :: MonadWidget t m => Dynamic t MetadataInfo -> m () pageMetadataWidget i = do el "title" $ dynText $ title <$> i elDynAttr "meta" (mkDescAttrs . description <$> i) blank where mkDescAttrs desc = "name" =: "description" "content" =: desc

In this snippet, the fragility number is probably close to 31 characters--the number of characters in string literals. This is out of a total of 277 non-whitespace characters, so the software fragility number for this bit of code is 11%. This half the fragility of the JS code we saw above! And as I've pointed out, larger real world JS examples are likely to have even higher fragility. I'm not sure how much we can conclude about the actual ratios of these fragility numbers, but at the very least it matches my experience that JS programs are significantly more buggy than Haskell programs.

The TDD people are probably thinking that my JS examples aren't very realistic because none of them have tests, and that tests would catch most of the identifier name mutations, bringing the fragility down closer to Haskell territory. It is true that tests will probably catch some of these things. But you have to write code to make that happen! It doesn't happen by default. Also, you need to take into account the fact that the tests themselves will have some fragility. Tests require time and effort to maintain. This is an area where this notion of the fragility number becomes less accurate. I suspect that since the metric only considers single character mutations it will underestimate the fragility of tests since mutating single characters in tests will automatically cause a build failure.

There seems to be a slightly paradoxical relationship between the fragility number and DRY. Imagine our above JS factorial functions had a test that completely reimplemented factorial and then tried a bunch of random values Quickcheck-style. This would yield a fragility number of zero! Any single character change in the code would cause a test failure. And any single character change in the tests would also cause a test failure. Single character changes can no longer classified fragile because we've violated DRY. You might say that the test suite shouldn't reimplement algorithm--you should just specific cases like f(5) == 120. But in an information theory sense this is still violating DRY.

Does this mean that the fragility number is not very useful? Maybe. I don't know. But I don't think it means that we should just throw away the idea. Maybe we should just keep in mind that this particular formulation doesn't have much to tell us about the fragility more complex coordinated multi-character changes. I could see the usefulness of this metric going either way. It could simplify down to something not very profound. Or it could be that measurements of the fragility of real world software projects end up revealing some interesting insights that are not immediately obvious even from my analysis here.

Whatever the usefulness of this fragility metric, I think the concept gets is thinking about software defects in a different way than we might be used to. If it turns out that my single character mutation model isn't very useful, perhaps the extension to multi-character changes could be useful. Hopefully this will inspire more people to think about these issues and play with the ideas in a way that will help us progress towards more reliable software and tools to build it with.

EDIT: Unsurprisingly, I'm not the first person to have thought of this. It looks like it's commonly known as mutation testing. That Wikipedia article makes it sound like mutation testing is commonly thought of as a way to assess your project's test suite. I'm particularly interested in what it might tell us about programming languages...i.e. how much "testing" we get out of the box because of our choice of programming language and implementation.

Categories: Offsite Blogs

mightybyte: Why version bounds cannot be inferred retroactively (using dates)

Planet Haskell - Mon, 08/22/2016 - 9:35am

In past debates about Haskell's Package Versioning Policy (PVP), some have suggested that package developers don't need to put upper bounds on their version constraints because those bounds can be inferred by looking at what versions were available on the date the package was uploaded. This strategy cannot work in practice, and here's why.

Imagine someone creates a small new package called foo. It's a simple package, say something along the lines of the formattable package that I recently released. One of the dependencies for foo is errors, a popular package supplying frequently used error handling infrastructure. The developer happens to already have errors-1.4.7 installed on their system, so this new package gets built against that version. The author uploads it to hackage on August 16, 2015 with no upper bounds on its dependencies. Let's for simplicity imagine that errors is the only dependency, so the .cabal file looks like this:

name: foo build-depends: errors

If we come back through at some point in the future and try to infer upper bounds by date, we'll see that on August 16, the most recent version of errors was 2.0.0. Here's an abbreviated illustration of the picture we can see from release dates:

If we look only at release dates, and assume that packages were building against the most recent version, we will try to build foo with errors-2.0.0. But that is incorrect! Building foo with errors-2.0.0 will fail because errors had a major breaking change in that version. Bottom line: dates are irrelevant--all that matters is what dependency versions the author happened to be building against! You cannot assume that package authors will always be building against the most recent versions of their dependencies. This is especially true if our developer was using the Haskell Platform or LTS Haskell because those package collections lag the bleeding edge even more. So this scenario is not at all unlikely.

It is also possible for packages to be maintaining multiple major versions simultaneously. Consider large projects like the linux kernel. Developers routinely do maintenance releases on 4.1 and 4.0 even though 4.2 is the latest version. This means that version numbers are not always monotonically increasing as a function of time.

I should also mention another point on the meaning of version bounds. When a package specifies version bounds like this...

name: foo build-depends: errors >= 1.4 && < 1.5 is not saying "my package will not work with errors-1.5 and above". It is actually saying, "I warrant that my package does work with those versions of errors (provided errors complies with the PVP)". So the idea that "< 1.5" is a "preemptive upper bound" is wrong. The package author is not preempting anything. Bounds are simply information. The upper and lower bounds are important things that developers need to tell you about their packages to improve the overall health of the ecosystem. Build tools are free to do whatever they want with that information. Indeed, cabal-install has a flag --allow-newer that lets you ignore those upper bounds and step outside the version ranges that the package authors have verified to work.

In summary, the important point here is that you cannot use dates to infer version bounds. You cannot assume that package authors will always be building against the most recent versions of their dependencies. The only reliable thing to do is for the package maintainer to tell you explicitly what versions the package is expected to work with. And that means lower and upper bounds.

Update: Here is a situation that illustrates this point perfectly: cryptonite issue #96. cryptonite-0.19 was released on August 12, 2016. But cryptonite-0.15.1 was released on August 22, 2016. Any library published after August 22, 2016 that depends on cryptonite-0.15.1 would not be able to build if the solver used dates instead of explicit version bounds.

Categories: Offsite Blogs

Brent Yorgey: Academic integrity: context and concrete steps

Planet Haskell - Sun, 08/21/2016 - 5:06pm

Continuing from my previous post, I wanted to write a bit about why I have been thinking about academic integrity, and what, concretely, I plan to do about it.

So, why have I been thinking about this? For one thing, my department had its fair share of academic integrity violations last year. On the one hand, it is right for students to be held accountable for their actions. On the other, in the face of a spate of violations, it is also right for us to reevaluate what we are doing and why, what sort of environmental factors may be pushing students to violate academic integrity, and how we can create a better environment. Environment does not excuse behavior, but it can shape behavior in profound ways.

Another reason for thinking about academic integrity is that starting this fall, I will be a member of the committee that hears and makes a determination in formal academic integrity cases at my institution. It seems no one wants to be on this committee, and to a certain extent I can understand why. But I chose it, for several reasons. For one, I think it is important to have someone on the committee from the natural sciences (I will be the only one), who understands issues of plagiarism in the context of technical subjects. I also care a lot about ensuring that academic integrity violations are handled carefully and thoughtfully, so that students actually learn something from the experience, and more importantly, so that they come through with their sense of belonging intact. When a student (or anyone, really) does something that violates the standards of a community and is subject to consequences, it is all too easy for them to feel as though they are now a lesser member or even excluded from the community. It takes much more intentional communication to make clear to them that although they may have violated a community standard—which necessarily comes with a consequence—they are still a valued member. (Thanks to Leslie Zorwick for explaining about the power of belonging, and for relating recent research showing that communicating belonging can make a big difference for students on academic probation—which seems similar to students accused or convicted of academic integrity violations. I would cite it but I think it is not actually published yet.)

Thinking about all of this is well and good, but what will I do about it? How do I go about communicating all of this to my students, and creating the sort of environment I want? Here are the concrete things I plan to do starting this fall:

  • In all my courses where it makes sense, I plan to require students to have at least one citation (perhaps three, if I am bold) on every assignment turned in—whether they cite web pages, help from TAs or classmates, and so on. The point is to get them thinking regularly about the resources and help that they make use of on every single assignment, to foster a spirit of thankfulness. I hope it will also make it psychologically harder for students to plagiarize and lie about it. Finally, I hope it will lead to better outcomes in cases where a student makes inappropriate use of an online resource—i.e. when they “consult” a resource, perhaps even deceiving themselves into thinking that they are really doing the work, but end up essentially copying the resource. If they don’t cite the resource in such a case, I have a messy academic integrity violation case on my hands; if they do, there is no violation, even though the student didn’t engage with the assignment as I would have hoped, and I can have a simple conversation with them about my expectations and their learning (and perhaps lower their grade).

  • I will make sure to communicate to my students how easy it is for me to detect plagiarism, and how dire the consequences can be. A bit of healthy fear never hurt!

  • But beyond that, I want to make sure my students also understand that I care much more about them, as human beings, than I do about their grade or whether they turn in an assignment. I suspect that a lot of academic integrity violations happen at 2am, the night before a deadline, when the student hasn’t even started the assignment and they are riddled with anxiety and running on little sleep—but they feel as though they have to turn something in and this urge overrides whatever convictions they might have about plagiarism. To the extent their decision is based on anxiety about grades, there’s not much I can do about it. However, if their decision stems from a feeling of shame at not turning something in and disappointing their professor, I can make a difference: in that moment, I want my students to remember that their value in my eyes as human beings is not tied to their academic performance; that I will be much more impressed by their honesty than by whether they turn something in.

  • As a new member of the academic integrity committee, I plan to spend most of my time listening and learning from the continuing members of the committee; but I do hope to make sure our communication with both accused and convicted students emphasizes that they are still valued members of our community.

Other concrete suggestions, questions, experiences to relate, etc. are all most welcome!

Categories: Offsite Blogs

Toby Goodwin: Debian chroot on Android

Planet Haskell - Sun, 08/21/2016 - 2:09am

Sometimes, a simple idea — so simple it can be distilled down to 4 words — can be truly astounding.


For quite a while, I've been considering the best way to ensure the resilience, security, and accessibility of various pieces of personal data. There are several different categories, and no solution will be optimal for all of them. My music collection, for example, is large, non-secret, and largely replaceable (although the thought of dragging that enormous box of CDs out of the garage and reripping them all is pretty daunting!) The music lives on a server in my home, with my own backups. I upload medium bitrate versions to a cloud music service, and I have a low bitrate copy on my laptop and phone. So that's pretty well covered.

A similar scheme covers my photos and videos. They are much less replaceable than music, but fortunately much smaller, so there are a few extra copies kicking about.

Then, I have a few tiny things that I want to keep in sync across various devices. For example, today's todo list, my "blue skies ideas" list, and my password store. I've looked at syncthing, which is an awesome project, and I'm sure I'm going to find a good use for it someday.

But for these things, git is really the obvious solution. Most of them are already git repos, including my password-store, the only missing piece is a git client on my phone. So I was searching for recommendations for Android git clients, and these words jumped out at me:

create a debian image, mount it in your android device and chroot to it

My flabber was well and truly gasted.


It's very straightforward. From some debian instance on which you have root, run:

debootstrap --foreign --arch=armhf jessie jessie

Tar up the resulting tree in jessie, copy it to android, unpack it (ah, but where?), chroot, and then run:

debootstrap --second-stage Which?

Here are some things I've used: ssh, rsync, dash, bash, the rc shell (which I happen to maintain). All the usual userland tools, mv, chmod, etc. These (of course) are the proper full GNU versions, so you don't keep being bitten by the little discrepancies in, for instance, the busybox versions.

Package management with apt-get and dpkg. And perl, git, nano, vim, update-alternatives (so I never have to see nano again), less, man.

I started installing the pass package, but that pulls in gtk, pango and a whole bunch of other things I'm not going to use. So I downloaded password-store and installed it myself.

The ps command (you need to mount /proc in the chroot of course), top, strace, lsof. You can even strace android processes, it all Just Works. (OK, lsof gets upset because it's inside a chroot and it can't see all the mount points that /proc/mounts says exist. But still.)

I thought it might be fun to run mosh. It installed fine, but then bombed out with a weird error. I went on a bit of a wild goose chase, and concluded (it was late at night) that I needed a fix from the development version. So I cloned the mosh repo on github, installed a whole heap of build tools, compilers, libraries, and built mosh. On my phone!

In fact, the problem was simpler than that, and easily solved by using the stty command to declare my terminal size. And then I had to open up some ports in android's firewall... with iptables of course.

I could go on, but you're getting the idea. In summary, this is not something pretending to be a GNU/Linux system. It's the real deal.

Of course, there are some missing pieces, of which the most serious is the lack of daemons. I've installed etckeeper, but there will be no daily autocommit.

Ping doesn't work, because even a root shell is not allowed to use raw sockets. You can create a user, but it's not able to do much... I'll look at this some more when I have time, but I'm just running everything as root at the moment. Android's systems for users, permissions, and capabilities are entirely unfamiliar to me, although I'm trying to learn.


I made and remade my chroot several times before I was happy with it. Hopefully these notes will make things quicker for you.

First of all, Debian wants a “real” filesystem, which is to say, anything except FAT. Of the existing partitions, an obvious choice would be /data, which on my phone is ext4. Unfortunately, the major major drawback of my phone is that its /data is tiddly, just 1GB, and perennially full. (I did try the chroot on /data, before realising the fatal flaw. One curiosity is that /data is mounted with nodev, so populating /dev fails till you remount without nodev. You might think it would be better to bind mount the real /dev into the chroot anyway, and you might well be right. But I've been running with the /dev made by debootstrap with no problems.)

So it's time to repartition my 32GB SD card. Android apparently doesn't support GPT which is only a minor nuisance. I do hate all that primary / extended / logical nonsense though, it's so 1980s.

Much, much more seriously, it complains bitterly if it finds an SD card without a FAT partition. This is infuriating. The kernel supports ext3 just fine (and ext4 too, at least for partitions on fixed internal storage, although apparently not for the SD card, which makes no sense to me). So, if I insert a card that happens to have an ext3 partition on it, why not just mount it? Or if there's some scenario I'm not aware of that might not work quite right, notify a dialogue that explains and offers to mount the partition anyway. What actually happens is a notification that tells you the SD card is “damaged”, and offers to format it. Arrggh!

(I have reason to believe that the latest versions of Android will countenance SD cards with real file systems, although I need to research this further.)

My next try was a 50MB FAT partition, and the remainder ext3. This largely worked, but it didn't leave anywhere for android to move those apps which are prepared to live on SD card, absolutely vital to squeeze some extra apps onto my old phone.

The final iteration was a 4GB FAT partition, and the rest ext3. Of course, I don't need 28GB for the chroot itself: it starts off well under 1G, and even after installing loads of stuff I'm still under 2G. But I realised that I'd be able to put my music collection on the ext3 partition, which would save the tedious step of renaming everything to comply with FAT restrictions (mainly the prohibition on : in file names). Of course, I can now rsync-over-ssh the music from my laptop, which seems to go quicker than via USB.

Another annoyance is that the ext3 partition on the SD card isn't automatically mounted. I've spent some time in the past trying to find a boot time hook I can use, but with no luck. So I have to do this from the android shell every time my phone reboots, using a helper script cunningly located under the mount point:

root@android:/ # cat /data/ext3/m #!/system/bin/sh mount -t ext3 /dev/block/mmcblk1p5 /data/ext3 What?

Far and away the nicest way to communicate with the chroot is to plug into a laptop or desktop and use adb shell from the Android SDK. At that point, it's scarcely different from sshing to a remote server.

Of course, the whole point of the phone is that it's portable. On the move, I'm using Jack Palevich's Terminal Emulator for Android and Klaus Weidner's Hacker's Keyboard. The keyboard has all the keys you need — Esc, Tab, Ctrl etc — so it's ideal for non-trivial tasks (such as vim!). But the tiny keys are very fiddly on my phone, especially in portrait, so I sometimes stick to my usual keyboard.

I've got a trivial helper script to start my favourite shell under ssh-agent:

root@android:/ # cat /data/ext3/ch #!/system/bin/sh exec chroot /data/ext3/jessie /usr/bin/ssh-agent /usr/bin/rc -l Whither?

So I have a fantastic solution to my document and password management problems. And a great ssh client. And a whole new architecture to build my projects on, most of which aren't the sort of thing that it makes much sense to run on a phone, but building in different places is always good for portability.

I'd heard that Android uses a "modified Linux" kernel, so I wasn't really expecting any of this stuff to work properly, let alone tools like strace and lsof. Apparently, though, the changes were folded back into the mainline kernel at the 3.3 release. My (3 year old) phone runs 3.4.5, so presumably this a fairly vanilla kernel.

This is awesome. Google has its faults, but their commitment to free software easily earns them the “least evil” prize among the current Internet quintumvirate. (That's Google, Apple, Facebook, Amazon, and Microsoft, for anyone who's been asleep the last few years.)

Realising that, yes, that computer in my pocket is a pukka Linux box has endeared me even further to Android. I'd love to write some apps for it... except I've already got more than enough projects to fill my “copious spare time”!

Categories: Offsite Blogs

Douglas M. Auclair (geophf): 1Liners for July 2016

Planet Haskell - Sat, 08/20/2016 - 4:12pm

  • July 14th, 2016: So you have x :: [a] in the IO monad, and the function f :: a -> b What is the expression that gets you IO [b]?
Categories: Offsite Blogs

Philip Wadler: Eric Joyce: Why the Brexit vote pushed me to support Scottish independence

Planet Haskell - Fri, 08/19/2016 - 8:09am
Former Labour MP Eric Joyce explains his change of heart.
At the referendum, still an MP, I gave independence very serious thought right up to the close of the vote. I finally came down on the side of No because I thought big EU states with a potential secession issue, like Spain and France, would prevent an independent Scotland joining the EU. This is obviously no longer the case. And I was, like the great majority of the economists and other experts whose opinion I valued, convinced that being outside the EU would be bonkers – it would badly harm our economy and hurt Scots in all sorts of unforeseen ways too.
The Brexit vote reversed that overnight: all of the arguments we in the unionist camp had used were made invalid at worst, questionable at best. This doesn’t mean they were necessarily all wrong. But it does mean that open-minded, rational No voters should at the very least seriously re-consider things in the light of the staggering new context. They should have an open ear to the experts saying that with independence, jobs in Scotland’s financial and legal service sectors will expand as English and international firms look to keep a foothold in the EU.  And to the reasonable prospect of an eventual £50+ oil price might realistically open the way to a final, generational, upswing in employment, and to security for Scotland’s extractive industries and their supply chain. And to the idea that preserving Scotland’s social democracy in the face of the Little Englander mentality of right-wing English Tories might be worth the fight.
Categories: Offsite Blogs

PowerShell is open sourced and is available on Linux

Lambda the Ultimate - Fri, 08/19/2016 - 3:23am

Long HN thread ensues. Many of the comments discuss the benefits/costs of basing pipes on typed objects rather than text streams. As someone who should be inclined in favor of the typed object approach I have to say that I think the text-only folks have the upper hand at the moment. Primary reason is that text as a lingua franca between programs ensures interoperability (and insurance against future changes to underlying object models) and self-documenting code. Clearly the Achilles' heel is parsing/unparsing.

As happens often, one is reminded of the discussions of DSLs and pipelines in Jon Bentley's Programming Pearls...

Categories: Offsite Discussion

Roman Cheplyaka: Docker configuration on Fedora

Planet Haskell - Thu, 08/18/2016 - 2:00pm

If you need to change the docker daemon options on Fedora, take a look at these files:

# ls /etc/sysconfig/docker* /etc/sysconfig/docker /etc/sysconfig/docker-network /etc/sysconfig/docker-storage /etc/sysconfig/docker-storage-setup

In my case, I needed to change the container base size, so I put the following in /etc/sysconfig/docker-storage:

DOCKER_STORAGE_OPTIONS="--storage-opt dm.basesize=20G"

These files are then sourced in /etc/systemd/system/, and the variables (such as DOCKER_STORAGE_OPTIONS) are passed to the docker daemon.

Categories: Offsite Blogs

Brent Yorgey: Academic integrity and other virtues

Planet Haskell - Thu, 08/18/2016 - 1:41pm

I have been thinking a lot recently about academic integrity. What does it mean? Why do we care—what is it we fundamentally want students to do and to be? And whatever it is, how do we go about helping them become like that?

As a general principle, I think we ought to focus not just on prohibiting certain negative behaviors, but rather on encouraging positive behaviors (which are in a suitable sense “dual” to the negative behaviors we want to prohibit). Mere prohibitions leave a behavioral vacuum—“OK, don’t do this, so what should I do?”—and incentivize looking for loopholes, seeing how close one can toe the line without breaking the letter of the law. On the other hand, a positive principle actively guides behavior, and in actively striving towards the ideal of the positive principle, one (ideally) ends up far away from the prohibited negative behavior.

In the case of academic integrity, then, it is not enough to say “don’t plagiarize”. In fact, if one focuses on the prohibition itself, this is a particularly difficult one to live by, because academic life is not lived in a vacuum: ideas and accomplishments never spring forth ex nihilo, owing nothing to the ideas and accomplishments of others. In reality, one is constantly copying in big and small ways, explicitly and implicitly, consciously and unconsciously. In fact, this is how learning works! We just happen to think that some forms of copying are acceptable and some are not. Now, there are good reasons for distinguishing acceptable and unacceptable copying; the point is that this is often more difficult and ambiguous for students than we care to admit.

So what is the “dual” of plagiarism? What are the positive virtues which we should instill in our students? One can, of course, say “integrity”, but I don’t think this goes far enough: to have integrity is to adhere to a particular set of moral principles, but which ones? Integrity means being truthful, but truthful about what? It seems this is just another way of saying “don’t plagiarize”, i.e. don’t lie about the source of an idea. I have come up with two other virtues, however, which I think really get at the heart of the issue: thankfulness and generosity. (And in the spirit of academic thankfulness, I should say that Vic Norman first got me thinking along these lines with his paper How Will You Practice Virtue Witout Skill?: Preparing Students to be Virtuous Computer Programmers, published in the 2014-2015 Journal of the ACMS; I was also influenced by a discussion of Vic’s paper with several others at the ACMS luncheon at SIGCSE 2016.)

Academic thankfulness has to do with recognizing one’s profound debt to the academic context: to all those thinkers and doers who have come before, and to all those who help you along your journey as a learner, whether professors, other students, or random strangers on the Internet. A thankful student is naturally driven to cite anything and everything, to give credit where credit is due, even to give credit where credit is not technically necessary but can serve as a token of thanks. A thankful student recognizes the hard work and unique contributions of others, rather than seeing others as mere means to their own ends. A thankful student never plagiarizes, since taking something from someone else and claiming it for one’s own is the height of ingratitude.

Academic generosity is about freely sharing one’s own ideas, sacrificing one’s time and energy to help others, and allowing others to share in credit and recognition. Being academically generous is harder than being thankful, because it opens you up to the potential ingratitude of others, but in some sense it is the more important of the two virtues: if no one were generous, no one would have anything to be thankful for. A generous student is naturally driven to cite anything and everything, to give credit and recognition to others, whether earned or not. A generous student recognizes others as worthy collaborators rather than as means to an end. A generous student never plagiarizes, since they know how it would feel to have their own generosity taken advantage of.

There’s more to say—about the circumstances that have led me to think about this, and about how one might actually go about instilling these virtues in students, but I think I will leave that for another post.

Categories: Offsite Blogs

Don Stewart (dons): Haskell devops/dev tools role at Standard Chartered (London)

Planet Haskell - Wed, 08/17/2016 - 2:16am

The Modelling Infrastructure (MI) team at Standard Chartered has an open position for a typed functional programming developer, based in London. MI are a devops-like team responsible for the continuous delivery, testing, tooling and general developer efficiency of the Haskell-based analytics package used by the bank. They work closely with other Haskell dev teams in the bank, providing developer tools, testing and automation on top of our git ecosystem.

The role involves improving the ecosystem for developers and further automation of our build, testing and release infrastructure. You will work with devs in London, as part of the global MI team (located in London and Singapore). Development is primarily in Haskell. Knowledge of the Shake build system and Bake continuous integration system would be helpful. Strong git skills would be an advantage. Having a keen eye for analytics, data analysis and data-driven approaches to optimizing tooling and workflows is desirable.

This is a permanent, associate director-equivalent positions in London

Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we definitely want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability. Demonstrated ability to write Haskell-based tooling around git systems would be a super useful.

The role requires physical presence in London, either in our Basinghall or Old Street sites. Remote work is not an option. No financial background is required.Contracting-based positions are also possible if desired.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send your resume to me – donald.stewart <at>

Tagged: jobs
Categories: Offsite Blogs

Philip Wadler: What I learned as a hired consultant to autodidact physicists

Planet Haskell - Mon, 08/15/2016 - 8:09am
Many programming languages, especially domain-specific ones, are designed by amateurs. How do we prevent obvious irregularities and disasters in languages before they become widespread (aspects of Javascript and R come to mind).

Folk in the human-computer interaction community have a notion of 'expert evaluation'. I wonder if we could develop something similar for programming languages?

Jakub Zalewski passed me the article 'What I learned as a hired consultant to autodidact physicists', which treads related ground, but for physics rather than computing.
Categories: Offsite Blogs

Ken T Takusagawa: [kctmprub] Selecting function arguments by type

Planet Haskell - Sun, 08/14/2016 - 7:33pm

Some programming languages permit a function to refer the arguments passed to it by number instead of by name, for example, Perl's @_ array.

We propose a similar mechanism of referring to function arguments by type.  This can only be done if there is only one argument of the given type in the list of parameters.  We introduce a special keyword, ARGUMENT_OF_TYPE, which when used with a type yields the desired argument.  Below, we use a syntax inspired by Haskell.

replicate_strings :: Int -> String -> String;
replicate_strings UNNAMED UNNAMED = concat $ intersperse " " $ replicate (ARGUMENT_OF_TYPE::Int) (ARGUMENT_OF_TYPE::String);

Somewhat more ambitious, possibly more confusing, would be to have type inference figure out which argument is needed.

replicate_strings :: Int -> String -> String;
replicate_strings UNNAMED UNNAMED = concat $ intersperse " " $ replicate ARGUMENT_OF_TYPE ARGUMENT_OF_TYPE;

The keyword UNNAMED marks the function arguments subject to this kind of inference.  They are awkward, but without them, it may be difficult or awkward for a function to return a function, i.e., it is a higher order function.  Perhaps it has a polymorphic return type which might or might not be function.

More ambitiously, if the arguments are of distinct types, then there could be some mechanism by which the order of the arguments at the call site does not matter.  Previously related ideas, for multiple element tuples and pairs.

Another idea: instead of UNNAMED being a keyword, let the parameters be user-chosen identifiers that do not have to be different for each parameter, but if they have the same name, they must be different types.  Add a (possibly optional) OVERLOADED_PARAMETER annotation to make things less confusing to others reading the code:

replicate_strings2 :: Int -> String -> Int -> String -> (String, String);
replicate_strings2 (OVERLOADED_PARAMETER x) (OVERLOADED_PARAMETER x) (OVERLOADED_PARAMETER y) (OVERLOADED_PARAMETER y) = (concat $ intersperse " " $ replicate (x::Int) (x::String), concat $ intersperse " " $ replicate (y::Int) (y::String))

Categories: Offsite Blogs

Neil Mitchell: The Four Flaws of Haskell

Planet Haskell - Sun, 08/14/2016 - 3:02pm

Summary: Last year I made a list of four flaws with Haskell. Most have improved significantly over the last year.

No language/compiler ecosystem is without its flaws. A while ago I made a list of the flaws I thought might harm people using Haskell in an industrial setting. These are not flaws that impact beginners, or flaws that stop people from switching to Haskell, but those that might harm a big project. Naturally, everyone would come up with a different list, but here is mine.

Package Management: Installing a single consistent set of packages used across a large code base used to be difficult. Upgrading packages within that set was even worse. On Windows, anything that required a new network package was likely to end in tears. The MinGHC project solved the network issue. Stackage solved the consistent set of packages, and Stack made it even easier. I now consider Haskell package management a strength for large projects, not a risk.

IDE: The lack of an IDE certainly harms Haskell. There are a number of possibilities, but whenever I've tried them they have come up lacking. The fact that every Haskell programmer has an entrenched editor choice doesn't make it an easier task to solve. Fortunately, with Ghcid there is at least something near the minimum acceptable standard for everyone. At the same time various IDE projects have appeared, notably the Haskell IDE Engine and Intero. With Ghcid the lack of an IDE stops being a risk, and with the progress on other fronts I hope the Haskell IDE story continues to improve.

Space leaks: As Haskell programs get bigger, the chance of hitting a space leak increases, becoming an inevitability. While I am a big fan of laziness, space leaks are the big downside. Realising space leaks were on my flaws list, I started investigating methods for detecting space leaks, coming up with a simple detection method that works well. I've continued applying this method to other libraries and tools. I'll be giving a talk on space leaks at Haskell eXchange 2016. With these techniques space leaks don't go away, but they can be detected with ease and solved relatively simply - no longer a risk to Haskell projects.

Array/String libraries: When working with strings/arrays, the libraries that tend to get used are vector, bytestring, text and utf8-string. While each are individually nice projects, they don't work seamlessly together. The utf8-string provides UTF8 semantics for bytestring, which provides pinned byte arrays. The text package provides UTF16 encoded unpinned Char arrays. The vector package provides mutable and immutable vectors which can be either pinned or unpinned. I think the ideal situation would be a type that was either pinned or unpinned based on size, where the string was just a UTF8 encoded array with a newtype wrapping. Fortunately the foundation library provides exactly that. I'm not brave enough to claim a 0.0.1 package released yesterday has derisked Haskell projects, but things are travelling in the right direction.

It has certainly been possible to use Haskell for large projects for some time, but there were some risks. With the improvements over the last year the remaining risks have decreased markedly. In contrast, the risks of using an untyped or impure language remain significant, making Haskell a great choice for new projects.

Categories: Offsite Blogs

Christoph Breitkopf: IntervalMap Retrospective

Planet Haskell - Fri, 08/12/2016 - 1:56pm
IntervalMap, my package for containers with intervals as keys, has gone through several changes since its inception in late 2011. As is often the case, I had a concrete problem to solve and the initial code did that and only that. Going public on Hackage made some changes necessary. Others cropped up later in the desire to get closer to the interface and performance of the other base containers. Some were even requested by users. Here's a rough history: VersionDateChanges 0.12011-12Initial version 0.22011-12Name changes; Hackage release 0.32012-08Split into lazy and strict 0.42014-08Interval type class 0.4.12015-12IntervalSet added 0.52016-01Make interval search return submaps/subsets where appropriate

Looking back at this, what astonishes me most is the late addition of sets, which were only added after a user requested it. Why didn't this come up right at the start? “interval-containers” would have been a better and more general package name.

The really big change - supporting user-defined intervals via a type class - was caused by my initial distrust of nonstandard language extensions. The result of this is one more level in the module structure for the now-preferred generic version, e.g.:

import qualified Data.IntervalMap.Generic.Strict as IVM

What it still unclear is where to put the Interval type class. There are many useful interval data structures and using Data.Interval for this particular purpose would not be acceptable. But having it under IntervalMap does not seem right either. One option would be a name change:

Data.IntervalMap Data.IntervalMap.Lazy Data.IntervalMap.Strict Data.IntervalSet Data.OrderedInterval

Another is to use a general module:

Data.IntervalContainers Data.IntervalContainers.Interval Data.IntervalMap Data.IntervalMap.Lazy Data.IntervalMap.Strict Data.IntervalSet

Or even put everything under that module:

Data.IntervalContainers Data.IntervalContainers.Interval Data.IntervalContainers.IntervalMap Data.IntervalContainers.IntervalMap.Lazy Data.IntervalContainers.IntervalMap.Strict Data.IntervalContainers.IntervalSet

None of these seems clearly preferable, so the best option is probably to leave the package name and module structure as is.

Data Structure

Preferring simplicity, I based the implementation on Red-Black trees. While this is in general competitive with size balanced binary trees as used by the base containers, there is one caveat: with plain red-black trees, it is not possible to implement a hedge-union algorithm or even simple optimizations for the very common “union of huge set and tiny set” operation unless one would use an additional redundant size field in the node.

This has become somewhat more pressing since version 0.5 where the search operations return subsets/submaps.


My initial fixed type implementation of intervals allowed mixing intervals with different openness, and type class has inherited this “feature”. I highly suspect that no one actually uses this - that is, all in instances of Interval, leftClosed and rightClosed ignore their argument and have a constant value.

If I'm wrong, and someone does use this, please let me know!

That design has a negative impact on performance, though. It turns up in the ordering of intervals and their endpoints. For example, a lower bound is actually lower than the identical lower bound of a second interval, if the first interval is closed and the second open.

Unless the compiler does whole-program optimization, it will not be able to eliminate the Interval dictionaries, and can do very little optimization. That's the reason Interval has many members that could also have been defined as module-level functions: above, below, contains, ... These functions are heavily used by the map and set implementations, and it seems that GHC at least is much better at optimizing these functions when they are defined as members (this is just an observation of mine - I don't know in detail why this should be the case). The disadvantage is that it's possible to break a lot of invariants if you actually define these members in an instance. That would not be possible if they were defined as regular functions. At least I'm in good company, because the same applies to Eq, Ord, and Monad to name just a few.

Data.Map tries to work around this by declaring many functions to be inlinable, effectively getting a bit closer to whole-program optimization. This does not seem feasible for IntervalMap, because the functions are used as deeper nesting, and in particular when constructing nodes.

As far as GHC is concerned, good performance is very dependent on inlining, especially when type classes and their dictionaries are involved. Any improvements in this would certainly benefit IntervalMap. One idea: it might be useful to be able to provide strictness information in type classes. In the case of Interval for example, lowerBound and upperBound always evaluate their argument (disregarding really pathological instances), but there is no way to indicate this to the compiler. Only the instance declarations allow inferring strictness information, and that's not available when compiling library code.


Whether a big refactoring or rewrite should be done really depends on the users of the package: you. Please let me know about missing features or performance issues, either via mail, a github issue, or a comment here.

Categories: Offsite Blogs

Kevin Scaldeferri: A Rails 2.3 Performance Gotcha

Planet Haskell - Fri, 08/12/2016 - 9:01am

Recently we upgraded to Rails 2.3 at work. Upon doing so, we saw our application take an aggregate performance hit of about 50%. Spending a little quality time with ruby-prof, one thing that stood out as peculiar was an awful lot of time being spent creating and tending to Ruby Thread objects. Tracing up the call stack, these were coming from inside the memcached client.

Buried in the Rails 2.3 release notes is the innocuous looking statement:

The bundled memcached client has been updated to version

What this fails to tell you is that this upgrade added a timeout option to the client which, as the RDoc will happily inform you, “is a major performance penalty in Ruby 1.8”. Moreover, the default value for the timeout is 0.5s, rather than nil.

Our application makes heavy use of caching, so we were getting massacred by this. To add insult to injury, our most heavily optimized pages were hit the hardest, since they used memcached the most. Adding :timeout => nil to our cache layer immediately restored the previous performance.

Lessons from this story: if you’re a Rails developer using memcached, go add that option to your code; if you’re a library author adding an option with significant performance costs, default to the old behavior; if you’re on the Ruby core team, please add an interface to select so that people don’t do silly things like use Threads to implement timeouts on socket reads.


Categories: Offsite Blogs

Kevin Scaldeferri: Changes

Planet Haskell - Fri, 08/12/2016 - 9:01am

After nearly 6 1/2 years, on Monday I gave notice at Yahoo. After next week, I’ll be moving on to new things (to be described in a later post). The short version is that I was not finding happiness or the career opportunities I wanted at Yahoo any more. The long version... well, you’ll have to come buy me a beer or two if you want that.


Categories: Offsite Blogs