News aggregator

Neil Mitchell: Upcoming talk: Writing build systems with Shake, 16 Aug 2016, London

Planet Haskell - Wed, 08/03/2016 - 5:22pm
Summary: I'm giving a talk on Shake.

I'm delighted to announce that I'll be giving a talk/hack session on Shake as part of the relatively new "Haskell Hacking London" meetup.

Title: Writing build systems with Shake

Date: Tuesday, August 16, 2016. 6:30 PM

Location: Pusher Office, 28 Scrutton Street, London

Abstract: Shake is a general purpose library for expressing build systems - forms of computation, with caching, dependencies and more besides. Like all the best stuff in Haskell, Shake is generic, with details such as "files" written on top of the generic library. Of course, the real world doesn't just have "files", but specifically has "C files that need to be compiled with gcc". In this hacking session we'll look at how to write Shake rules, what existing functions people have already layered on top of Shake for compiling with specific compilers, and consider which rules are missing. Hopefully by the end we'll have a rule that people can use out-of-the-box for compiling C++ and Haskell.

To put it another way, it's all about layering up. Haskell is a programming language. Shake is a Haskell library for dependencies, minimal recomputation, parallelism etc. Shake also provides as a layer on top (but inside the same library) to write rules about files, and ways to run command line tools. Shake doesn't yet provide a layer that compiles C files, but it does provide the tools with which you can write your own. The aim of this talk/hack session is to figure out what the next layer should be, and write it. It is definitely an attempt to move into the SCons territory of build systems, which knows how to build C etc. out of the box.
Categories: Offsite Blogs

JP Moresmau: Scratching an itch: generating Cabal data-files field automatically

Planet Haskell - Wed, 08/03/2016 - 12:26pm
Maybe I didn't look hard enough, but I'm not aware of a tool to generate the contents of the Cabal data-files field automatically when you have loads of folders to include. Cabal has very simple wildcard matching for files references in this field, by design (to avoid including too much data in a source distribution). So it only supports wildcards to replace the file name inside a directory for a given extension, and doesn't support sub directories.

For the reload project - first release on Hackage! - I had to include loads of files, all the Polymer web components the UI depends on, which are all on different sub directories, with a bunch of different extensions. So I wrote a little tool to generate the field automatically, and put it on Hackage too.

You pass it a directory name, possibly some subdirectories and extensions to ignore, and it generates all the required entries. Saved me loads of time, and scratched my own itch!
Categories: Offsite Blogs

mightybyte: How to Get a Haskell Job

Planet Haskell - Wed, 08/03/2016 - 8:02am

Over and over again I have seen people ask how to get a full time job programming in Haskell. So I thought I would write a blog post with tips that have worked for me as well as others I know who write Haskell professionally. For the impatient, here's the tl;dr in order from easiest to hardest:

  1. IRC
  2. Local meetups
  3. Regional gatherings/hackathons
  4. Open source contributions
  5. Work where Haskell people work

First, you need to at least start learning Haskell on your own time. You had already started learning how to program before you got your first programming job. The same is true of Haskell programming. You have to show some initiative. I understand that for people with families this can be hard. But you at least need to start. After that, far and away the most important thing is to interact with other Haskell developers so you can learn from them. That point is so important it bears repeating: interacting with experienced Haskell programmers is by far the most important thing to do. Doing this at a job would be the best, but there are other things you can do.

1. IRC. Join the #haskell channel on Freenode. Lurk for awhile and follow some of the conversations. Try to participate in discussions when topics come up that interest you. Don't be afraid to ask what might seem to be stupid questions. In my experience the people in #haskell are massively patient and willing to help anyone who is genuinely trying to learn.

2. Local meetups. Check to see if there is a Haskell meetup in a city near you. I had trouble finding a local meetup when I was first learning Haskell, but there are a lot more of them now. Don't just go to listen to the talks. Talk to people, make friends. See if there's any way you can collaborate with some of the people there.

3. Larger regional Haskell events. Find larger weekend gatherings of Haskell developers and go to them. Here are a few upcoming events that I know of off the top of my head:

The first event like this that I went to was Hac Phi a few years back. Going there majorly upped my game because I got to be around brilliant people, pair program with some of them, and ultimately ended up starting the Snap Web Framework with someone I met there. You might not have a local meetup that you can go to, but you can definitely travel to go to one of these bigger weekend events. I lived a few hours away from Hac Phi, but I know a number of people who travel further to come. If you're really interested in improving your Haskell, it is well worth the time and money. I cannot emphasize this enough.

4. Start contributing to an open source Haskell project. Find a project that interests you and dive in. Don't ask permission, just decide that you're going to learn enough to contribute to this thing no matter what. Join their project-specific IRC channel if they have one and ask questions. Find out how you can contribute. Submit pull requests. This is by far the best way to get feedback on the code that you're writing. I have actually seen multiple people (including some who didn't strike me as unusually talented at first) start Haskell and work their way up to a full-time Haskell job this way. It takes time and dedication, but it works.

5. Try to get a non-haskell job at a place where lots of Haskell people are known to work. Standard Chartered uses is Haskell but is big enough to have non-Haskell jobs that you might be able to fit. S&P Capital IQ doesn't use Haskell but has a significant number of Haskell people who are coding in Scala.

Categories: Offsite Blogs

Ketil Malde: CAS-based generic data store

Planet Haskell - Wed, 08/03/2016 - 4:00am

Bioinformatics projects routinely generate terabytes of sequencing data, and the inevitable analysis that follows can easily increase this by an order of magnitude or more. Not everything is worth keeping, but in order to ensure reproducibility and to be able to reuse data in new projects, it is important to store what needs to be kept in a structured way.

I have previously described and implemented a generic data store, called medusa. Following the eXtreme Programming principle of always starting with the simplest implementation that could possibly work, the system was designed around a storage based on files and directories. This has worked reasonably well, and makes data discoverable and accessible both directly in the file system, and through web-based services providing browsing, metadata search (with free text and keyword based indexes), BLAST search, and so forth.

Here, I explore the concept of content adressable storage (CAS), which derives unique names for data objects from their content.

The CAS principle

The essence of any storage system is being able to store objects with some kind of key (or label, or ID), and being able to retrive them based on the same key. What distinguishes a content adressable storage from other storage systems is that the key is generated from the entire data object, typically using a cryptographic hash function like MD5 or SHA1.

This means that a given object will always be stored under the same key, and that modifications to an object will also change its key, essentially creating a new object.

A layered model

Using CAS more clearly separates the storage model from the semantics of data sets. This gives us a layered architecture for the complete system, and services are implemented on top of these layers as independent and modular programs.

The object store

The object store is conceptually simple. It provides a simple interface that consists of the following primitive operations:

a data object into the store
the keys that refer to data objects
a data object using its key

The storage itself is completely oblivious to the actual contents of data objects, and it has no concept of hierarchy or other relationships between objects.

Metadata semantics

When we organize data, we do of course want to include relationships between objects, and also between data objects and external entities and concepts. This is the province of metadata. Metadata semantics are provided by special metadata objects which live in the object store like any other objects. Each metadata object defines and describes a specific data set. As in former incarnations of the system, metadata is structured as XML documents, and provides information about (and the identity of) the data objects constituting the data set. It also describes the relationship between data sets, for instance allowing new versions to obsolete older ones.

The metadata objects are primarily free-form text objects, allowing users to include whatever information they deem relevant and important. The purpose of using XML is to make specific parts of the information computationally accessible, unambiguous, and standardized. For instance, structured references (i.e. specific XML elements) to data objects with their key allows automatic retrieval of the complete dataset. In addition to referencing objects in the object store, similar structures allow unambigous references to external entities, for instance species, citation of scientific works, and uniform formatting of dates and geographic locations.

A command line interface to the metadata is provided through the `mdz` command, this allows a variety of operations on data sets, including listing, importing, exporting, and synchronizing with other repositories. In addition, the system implements a web-based front end to the data, as well as metatdata indexing via xapian.

Data objects and services

As shown in the previous sections, the system can be conceptually divided in three levels: the object store, the metadata level, and the data semantic level. A service typically accesses data on one or more of these levels. For instance, a (hypothetical) service to ensure distributed redundancy may only need to access the object store, oblivious to the contents of the objects. Other services, like the (existing) functionality to import data sets, or transfer data sets between different servers, need to understand the metadata format. And even more specific services may also need to understand the format of data objects - e.g. the BLAST service scans metadata to find FASTA-formatted sequence data, and integrate them into its own database. The important principles that services adhere to are: 1) a service can ignore anything that is irrelevant to it, and 2) can reconstruct its entire state from the contents of the object store.

Discussion CAS Advantages

Perhaps the primary advantage of using the hash value as the ID for data objects, is that it allows the system to be entirely distributed. The crucial advantage is that keys (by definition) are unique to the data. With user-selected keys, the user must somehow ensure the uniqueness of the key, and this requires a central authority or at the very least an agreed-upon naming scheme. In contrast, names for objects in CAS depend only on the contents, and the system can be implemented with no central oversight.

That keys depend on contents further means that data are immutable - storing a modified data object results in a different key. Immutability is central to reproducibility (you won't get the same results if you run your analysis with different data), and previously this was maintained by keeping a separate registry of metadata checksums, and also including checksums for data objects in the metadata. This made it possible to verify correctness (as long as the registry was available and correct), with CAS, this becomes even easier since the checksum is the same as the name you use to retrieve the data object.

Another benefit is deduplication of data objects. Objects with the same contents will always be stored under the same key, so this is automatic. This also makes it easier to track files across renames (analyses tend to produce output files with generic names like "contigs.fasta", it is often useful to give these files a more descriptive name), with CAS it becomes trivial to check if any file exists in the storage.

Decoupling the data from a fixed filesystem layout introduces another level of abstraction, and this makes it easier to change the underlying storage model. In later years, key-value storage models have replaced relational databases in many applications, in particular where high scalability is more important than structured data. Consequently, we have seen a plethora of so-called "NoSQL" databases emerge, including CouchDB, Cassandra, and many others, which could be plugged in as an alternative back-end storage. Storage "in the cloud", like Amazon's S3 or Google's Cloud Storage are also good alternatives.

The added opacity makes it less likely (but still technically possible) for users with sufficient privileges to perform "illegal" operations on data (for instance, modification or removal).


The implicit assumption for CAS is that different data objects hash to different hash values. In an absolute sense, this is trivially false (since there only exist 2160 possible hash values, and an infinity of possible data objects). But it is true in a probabilistic sense, and we can calculate the probability of collisions from the birthday paradox. For practical purposes, any collision is extremely unlikely, and like the revision control system git (which also is CAS-based), collisions are checked for by the system, and can be dealt with manually if they should occur.

Abstracting out the storage layer can be an advantage, but it also makes the system more opaque. And although the ability of humans to select misleading or confusing names can hardly be underestimated, even a poorly chosen name is usually more informative than the hexadecimal key representing a hash value.

Mixed blessings

Previous versions used a fixed directory structure, where each data set included a metadata file, and an arbitrary set of data files. Using a content adressable object store is more flexible, and there is nothing preventing the implementation of a parallel metadata scheme sharing the same data store, and even referring to the same data objects. One could also create metadata objects that refer to other metadata objects. As always, fewer restrictions also means more opportunities for confusion and increased complexity.

Perhaps the most drastic change is how datasets can have their status changed - e.g. be marked as obsolete or invalid. Previously, metadata was versioned, meaning there could exist a (linear) sequence of metadata for the same dataset. This was enforced by convention only, and also required a central synchronization of metadata updates to avoid name and version collisions. Since the object store only allows the addition of new objects, and in particular, not modification, status updates can only be achieved by adding new objects. Metadata objects can refer to other datasets, and specify a context, for instance, a data set containing analysis results can specify being based on a data set containing input data for the analysis. Status changes are now implemented using this mechanism, and datasets can refer to other data sets as "invalidated" or "obsoleted".

Current Status and Availability

The system is currently working on my internal systems, it is based on standard components (mostly shell scripts), and although one might expect some rough edges, it should be fairly easy to deploy.

Do let me know if you are interested.

Categories: Offsite Blogs

mightybyte: Measuring Software Fragility

Planet Haskell - Tue, 08/02/2016 - 3:43pm
<style> .hl { background-color: orange; } </style>

While writing this comment on reddit I came up with an interesting question that I think might be a useful way of thinking about programming languages. What percentage of single non-whitespace characters in your source code could be changed to a different character such that the change would pass your CI build system but would result in a runtime bug? Let's call this the software fragility number because I think that metric gives a potentially useful measure of how bug prone your software is.

At the end of the day software is a mountain of bytes and you're trying to get them into a particular configuration. Whether you're writing a new app from scratch, fixing bugs, or adding new features, the number of bytes of source code you have (similar to LOC, SLOC, or maybe the compressed number of bytes) is rough indication of the complexity of your project. If we model programmer actions as random byte mutations over all of a project's source and we're trying to predict the project's defect rate this software fragility number is exactly the thing we need to know.

Now I'm sure many people will be quick to point out that this random mutation model is not accurate. Of course that's true. But I would argue that in this way it's similar to the efficient markets hypothesis in finance. Real world markets are obviously not efficient (Google didn't become $26 billion less valuable because the UK voted for brexit). But the efficient markets model is still really useful--and good luck finding a better one that everybody will agree on.

What this model lacks in real world fidelity, it makes up for in practicality. We can actually build an automated system to calculate a reasonable approximation of the fragility number. All that has to be done is take a project, randomly mutate a character, run the project's whole CI build, and see if the result fails the build. Repeat this for every non-whitespace character in the project and count how many characters pass the build. Since the character was generated at random, I think it's reasonable to assume that any mutation that passes the build is almost definitely a bug.

Performing this process for every character in a large project would obviously require a lot of CPU time. We could make this more tractable by picking characters at random to mutate. Repeat this until you have done it for a large enough number of characters and then see what percentage of them made it through the build. Alternatively, instead of choosing random characters you could choose whole modules at random to get more uniform coverage over different parts of the language's grammar. There are probably a number of different algorithms that could be tried for picking random subsets of characters to test. Similar to numerical approximation algorithms such as Newton's method, any of these algorithms could track the convergence of the estimate and stop when the value gets to a sufficient level of stability.

Now let's investigate actual fragility numbers for some simple bits of example code to see how this notion behaves. First let's look at some JavaScript examples.

It's worth noting that comment characters should not be allowed to be chosen for mutation since they obviously don't affect the correctness of the program. So the comments you see here have not been included in the calculations. Fragile characters are highlighted in orange.

// Fragility 12 / 48 = 0.25 function f(n) { if ( n < 2 ) return 1; else return n * f(n-1); } // Fragility 14 / 56 = 0.25 function g(n) { var p = 1; for (var i = 2; i <= n; i++ ) { p *= i; } return p; }

First I should say that I didn't write an actual program to calculate these. I just eyeballed it and thought about what things would fail. I easily could have made mistakes here. In some cases it may even be subjective, so I'm open to corrections or different views.

Since JavaScript is not statically typed, every character of every identifier is fragile--mutating them will not cause a build error because there isn't much of a build. JavaScript won't complain, you'll just start getting undefined values. If you've done a signifciant amount of JavaScript development, you've almost definitely encountered bugs from mistyped identifier names like this. I think it's mildly interesting that the recursive and iterative formulations if this function both have the same fragility. I expected them to be different. But maybe that's just luck.

Numerical constants as well as comparison and arithmetic operators will also cause runtime bugs. These, however, are more debatable because if you use the random procedure I outlined above, you'll probably get a build failure because the character would have probably changed to something syntactically incorrect. In my experience, it semes like when you mistype an alpha character, it's likely that the wrong character will also be an alpha character. The same seems to be true for the classes of numeric characters as well as symbols. The method I'm proposing is that the random mutation should preserve the character class. Alpha characters should remain alpha, numeric should remain numeric, and symbols should remain symbols. In fact, my original intuition goes even further than that by only replacing comparison operators with other comparison operators--you want to maximize the chance that new mutated character will cause a successful build so the metric will give you a worst-case estimate of fragility. There's certainly room for research into what patterns tend come up in the real world and other algorithms that might describe that better.

Now let's go to the other end of the programming language spectrum and see what the fragility number might look like for Haskell.

// Fragility 7 / 38 = 0.18 f :: Int -> Int f n | n < 2 = 1 | otherwise = n * f (n-1)

Haskell's much more substantial compile time checks mean that mutations to identifier names can't cause bugs in this example. The fragile characters here are clearly essential parts of the algorithm we're implementing. Maybe we could relate this idea to information theory and think of it as an idea of how much information is contained in the algorithm.

One interesting thing to note here is the effect of the length of identifier names on the fragility number. In JavaScript, long identifier names will increase the fragility because all identifier characters can be mutated and will cause a bug. But in Haskell, since identifier characters are not fragile, longer names will lower the fragility score. Choosing to use single character identifier names everywhere makes these Haskell fragility numbers the worst case and makes JavaScript fragility numbers the best case.

Another point is that since I've used single letter identifier names it is possible for a random identifier mutation in Haskell to not cause a build failure but still cause a bug. Take for instance a function that takes two Int parameters x and y. If y was mutated to x, the program would still compile, but it would cause a bug. My set of highlighted fragile characters above does not take this into account because it's trivially avoidable by using longer identifier names. Maybe this is an argument against one letter identifier names, something that Haskell gets criticism for.

Here's the snippet of Haskell code I was talking about in the above reddit comment that got me thinking about all this in the first place:

// Fragility 31 / 277 = 0.11 data MetadataInfo = MetadataInfo { title :: Text , description :: Text } pageMetadataWidget :: MonadWidget t m => Dynamic t MetadataInfo -> m () pageMetadataWidget i = do el "title" $ dynText $ title <$> i elDynAttr "meta" (mkDescAttrs . description <$> i) blank where mkDescAttrs desc = "name" =: "description" "content" =: desc

In this snippet, the fragility number is probably close to 31 characters--the number of characters in string literals. This is out of a total of 277 non-whitespace characters, so the software fragility number for this bit of code is 11%. This half the fragility of the JS code we saw above! And as I've pointed out, larger real world JS examples are likely to have even higher fragility. I'm not sure how much we can conclude about the actual ratios of these fragility numbers, but at the very least it matches my experience that JS programs are significantly more buggy than Haskell programs.

The TDD people are probably thinking that my JS examples aren't very realistic because none of them have tests, and that tests would catch most of the identifier name mutations, bringing the fragility down closer to Haskell territory. It is true that tests will probably catch some of these things. But you have to write code to make that happen! It doesn't happen by default. Also, you need to take into account the fact that the tests themselves will have some fragility. Tests require time and effort to maintain. This is an area where this notion of the fragility number becomes less accurate. I suspect that since the metric only considers single character mutations it will underestimate the fragility of tests since mutating single characters in tests will automatically cause a build failure.

There seems to be a slightly paradoxical relationship between the fragility number and DRY. Imagine our above JS factorial functions had a test that completely reimplemented factorial and then tried a bunch of random values Quickcheck-style. This would yield a fragility number of zero! Any single character change in the code would cause a test failure. And any single character change in the tests would also cause a test failure. Single character changes can no longer classified fragile because we've violated DRY. You might say that the test suite shouldn't reimplement algorithm--you should just specific cases like f(5) == 120. But in an information theory sense this is still violating DRY.

Does this mean that the fragility number is not very useful? Maybe. I don't know. But I don't think it means that we should just throw away the idea. Maybe we should just keep in mind that this particular formulation doesn't have much to tell us about the fragility more complex coordinated multi-character changes. I could see the usefulness of this metric going either way. It could simplify down to something not very profound. Or it could be that measurements of the fragility of real world software projects end up revealing some interesting insights that are not immediately obvious even from my analysis here.

Whatever the usefulness of this fragility metric, I think the concept gets is thinking about software defects in a different way than we might be used to. If it turns out that my single character mutation model isn't very useful, perhaps the extension to multi-character changes could be useful. Hopefully this will inspire more people to think about these issues and play with the ideas in a way that will help us progress towards more reliable software and tools to build it with.

EDIT: Unsurprisingly, I'm not the first person to have come up with this idea. It looks like it's commonly known as mutation testing. That Wikipedia article makes it sound like mutation testing is commonly thought of as a way to assess your project's test suite. I'm particularly interested in what it might tell us about programming languages...i.e. how much "testing" we get out of the box because of our choice of programming language and implementation.

Categories: Offsite Blogs

Philip Wadler: Michael Moore: Five Reasons Why Trump Will Win

Planet Haskell - Tue, 08/02/2016 - 7:55am
In the Huffington Post, Michael Moore give the most incisive (and hilarious) analysis I've seen. We have to understand why this is happening if we are to have a hope of preventing it.
1. Midwest Math, or Welcome to Our Rust Belt Brexit. I believe Trump is going to focus much of his attention on the four blue states in the rustbelt of the upper Great Lakes - Michigan, Ohio, Pennsylvania and Wisconsin. Four traditionally Democratic states - but each of them have elected a Republican governor since 2010 (only Pennsylvania has now finally elected a Democrat). In the Michigan primary in March, more Michiganders came out to vote for the Republicans (1.32 million) that the Democrats (1.19 million). Trump is ahead of Hillary in the latest polls in Pennsylvania and tied with her in Ohio. Tied? How can the race be this close after everything Trump has said and done? Well maybe it’s because he’s said (correctly) that the Clintons’ support of NAFTA helped to destroy the industrial states of the Upper Midwest. ...

And this is where the math comes in. In 2012, Mitt Romney lost by 64 electoral votes. Add up the electoral votes cast by Michigan, Ohio, Pennsylvania and Wisconsin. It’s 64. All Trump needs to do to win is to carry, as he’s expected to do, the swath of traditional red states from Idaho to Georgia (states that’ll never vote for Hillary Clinton), and then he just needs these four rust belt states. He doesn’t need Florida. He doesn’t need Colorado or Virginia. Just Michigan, Ohio, Pennsylvania and Wisconsin. And that will put him over the top. This is how it will happen in November.4. The Depressed Sanders Vote. Stop fretting about Bernie’s supporters not voting for Clinton - we’re voting for Clinton! The polls already show that more Sanders voters will vote for Hillary this year than the number of Hillary primary voters in ‘08 who then voted for Obama. This is not the problem. The fire alarm that should be going off is that while the average Bernie backer will drag him/herself to the polls that day to somewhat reluctantly vote for Hillary, it will be what’s called a “depressed vote” - meaning the voter doesn’t bring five people to vote with her. He doesn’t volunteer 10 hours in the month leading up to the election. She never talks in an excited voice when asked why she’s voting for Hillary. A depressed voter. Because, when you’re young, you have zero tolerance for phonies and BS. Returning to the Clinton/Bush era for them is like suddenly having to pay for music, or using MySpace or carrying around one of those big-ass portable phones. They’re not going to vote for Trump; some will vote third party, but many will just stay home. Hillary Clinton is going to have to do something to give them a reason to support her — and picking a moderate, bland-o, middle of the road old white guy as her running mate is not the kind of edgy move that tells millenials that their vote is important to Hillary. Having two women on the ticket - that was an exciting idea. But then Hillary got scared and has decided to play it safe. This is just one example of how she is killing the youth vote.
Categories: Offsite Blogs

Don Stewart (dons): Four roles in Strats at Standard Chartered (London and Singapore)

Planet Haskell - Tue, 08/02/2016 - 4:12am

The Strats team at Standard Chartered has another four new open positions for typed functional programming developers, based in London or Singapore. Strats are a specialized software engineering and quantitative analysis team who build a broad range of software for financial markets users at Standard Chartered.

You will work on the trading floor, directly with traders, sales and risk managers, building software to automate their work and improve their efficiency. The new roles are to build low latency XVA pricing services and for trade hedge identification and management. Other roles and projects are also possible.

In general you will use Haskell for almost all tasks: data analysis, market data publishing, database access, web services, desktop GUIs, large parallel tasks, quantitative models, solvers, everything. This is a fast paced role – code you write today will be deployed within hours to hundreds of users and has to work.

These are permanent, associate director and director positions, in London and Singapore as part of the Strats global team. Demonstrated experience in typed FP (Haskell, OCaml, F# etc) is required. We have around 3 million lines of Haskell, and our own Haskell compiler. In this context we look for skill and taste in typed functional programming to capture and abstract over complex, messy systems. You would join a growing team of around 20 experienced Haskell developers that is expanding due to increased business need for Haskell developers.

Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we definitely want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability. A PhD or Masters Degree in Computer Science is an advantage (but not a requirement). A bachelor’s degree in computer science, math or a related discipline is a strong advantage.

The role requires physical presence on the trading floor in Singapore or London. Remote work is not an option. You will have some project and client management skills — you will talk to users, understand their problems and then implement and deliver what they really need. No financial background is required. These positions have attractive remuneration for the right candidates. Relocation support will also be provided. Contracting-based positions are also possible if desired.

Applicants who don’t necessarily meet all criteria but have an interest in working in Singapore in particular, and have an FP background, are encouraged to apply.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send your resume to me – donald.stewart <at>

Tagged: jobs
Categories: Offsite Blogs

mightybyte: "cabal gen-bounds": easy generation of dependency version bounds

Planet Haskell - Mon, 08/01/2016 - 10:30pm

In my last post I showed how release dates are not a good way of inferring version bounds. The package repository should not make assumptions about what versions you have tested against. You need to tell it. But from what I've seen there are two problems with specifying version bounds:

  1. Lack of knowledge about how to specify proper bounds
  2. Unwillingness to take the time to do so

Early in my Haskell days, the first time I wrote a cabal file I distinctly remember getting to the dependencies section and having no idea what to put for the version bounds. So I just ignored them and moved on. The result of that decision is that I can no longer build that app today. I would really like to, but it's just not worth the effort to try.

It wasn't until much later that I learned about the PVP and how to properly set bounds. But even then, there was still an obstacle. It can take some time to add appropriate version bounds to all of a package's dependencies. So even if you know the correct scheme to use, you might not want to take the time to do it.

Both of these problems are surmountable. And in the spirit of doing that, I would like to propose a "cabal gen-bounds" command. It would check all dependencies to see which ones are missing upper bounds and output correct bounds for them. I have implemented this feature and it is available at Here is what it looks like to use this command on the cabal-install package:

$ cabal gen-bounds Resolving dependencies... The following packages need bounds and here is a suggested starting point. You can copy and paste this into the build-depends section in your .cabal file and it should work (with the appropriate removal of commas). Note that version bounds are a statement that you've successfully built and tested your package and expect it to work with any of the specified package versions (PROVIDED that those packages continue to conform with the PVP). Therefore, the version bounds generated here are the most conservative based on the versions that you are currently building with. If you know your package will work with versions outside the ranges generated here, feel free to widen them. network >= 2.6.2 && < 2.7, network-uri >= 2.6.0 && < 2.7,

The user can then paste these lines into their build-depends file. They are formatted in a way that facilitates easy editing as the user finds more versions (either newer or older) that the package builds with. This serves to both educate users and automate the process. I think this removes one of the main frustrations people have about upper bounds and is a step in the right direction of getting more hackage packages to supply them. Hopefully it will be merged upstream and be available in cabal-install in the future.

Categories: Offsite Blogs

Robert Harper: Cubical Higher Type Theory as a Programming Language

Planet Haskell - Mon, 08/01/2016 - 11:06am

I gave a presentation at the Workshop on Categorical Logic and Univalent Foundations held in Leeds, UK July 27-29th 2016.  My talk, entitled Computational Higher Type Theory, concerns a formulation of higher-dimensional type theory in which terms are interpreted directly as programs and types as programs that express specifications of program behavior.  This approach to type theory, first suggested by Per Martin-Löf in his famous paper Constructive Mathematics and Computer Programming and developed more fully by The NuPRL Project, emphasizes constructive mathematics in the Brouwerian sense: proofs are programs, propositions are types.

The now more popular accounts of type theory emphasize the axiomatic freedom given by making fewer foundational commitments, such as not asserting the decidability of every type, but give only  an indirect account of their computational content, and then only in some cases.  In particular, the computational content of Voevodsky’s Univalence Axiom in Homotopy Type Theory remains unclear, though the Bezem-Coquand-Huber model in cubical sets carried out in constructive set theory gives justification for its constructivity.

To elicit the computational meaning of higher type theory more clearly, emphasis has shifted to cubical type theory (in at least two distinct forms) in which the higher-dimensional structure of types is judgmentally explicit as the higher cells of a type, which are interpreted as identifications.  In the above-linked talk I explain how to construe a cubical higher type theory directly as a programming language.  Other efforts, notably by Cohen-Coquand-Huber-Mörtberg, have similar goals, but using somewhat different methods.

For more information, please see my home page on which are linked two arXiv papers providing the mathematical details, and a 12-page paper summarizing the approach and the major results obtained so far.  These papers represent joint work with Carlo Angiuli and Todd Wilson.

Filed under: Research
Categories: Offsite Blogs

Manuel M T Chakravarty: Static versus dynamic

Planet Haskell - Mon, 08/01/2016 - 7:47am

In the transition from Objective-C to Swift, the iOS and Mac developer community struggles with a question: What is better? Static or dynamic languages?

To answer this question, we have to ask a few more. What is a static language? What is a dynamic language? More fundamentally, what is a programming language?

Every programming language is what computer scientists call a formal language. It is a rigorously defined construct including (1) a formal grammar (its syntactic formation rules) and (2) a formal semantics (the meaning of syntactically well-formed programs). This is always the case, irrespective of whether the creators of the programming language designed the language with those components in mind. As soon as they implement the language (by writing an interpreter or compiler), they indirectly commit to a formal grammar and a formal semantics. In other words, by implementing the language, the language creators fix what happens if you run and maybe compile programs in that new language. They fix that with the utmost precision, as otherwise no computer would be able to run these programs — hence, we call them formal.

For our discussion, the most interesting component is the language semantics, which we can, again, split into two components: the static semantics and the dynamic semantics. The static semantics are those aspects of a program that we can reason about without executing the program. A prime example of the static semantics is scoping: if I use a variable x, which declaration does that x refer to. Another aspect of the static semantics is the (static) type system.

In contrast, the dynamic semantics characterises the execution of programs. It determines what the computer does when it processes a particular language construct.

Every language has both a static and a dynamic semantics. Without a static semantics, we wouldn’t know which declarations the use of variables, functions, and so forth refer to and, without a dynamic semantics, we cannot run a program. Nevertheless, the static and dynamic semantics of different languages can surely be of varying levels of expressiveness. For example, the expressiveness of the static semantics varies with the capabilities of the type system, and the expressiveness of the dynamic semantics depends on whether the language includes advanced runtime capabilities, such as exceptions, first-class functions, reflection, or dynamic method dispatch.

When people talk about “dynamic” versus “static” languages, they suggest that the languages designers put a particular emphasis on either the dynamic or static semantics of the language. For example, Smalltalk and Lisp, as representatives of dynamic languages, lack a static type system, but come with strong support for reflection and meta-programming. In contrast, Java, a popular static language, lacks strong support for meta-programming, but employs a strong static type system.

Unfortunately, this characterisation is increasingly inaccurate. Modern languages with a strong static semantics also support features requiring an advanced dynamic semantics. For example, meta-programming is a common theme in C++, ML dialects, and Haskell. Moreover, work on flow types, contracts, and gradual typing can be regarded as enhancing the static semantics of Lisp dialects, JavaScript, and others.

All in all, it is not an either-or proposition. Just because a language has a strong static semantics does not mean that it cannot also have a strong dynamic semantics. If languages often start out falling into one camp, this is because language design requires much hard work and you get something working more quickly by limiting the scope of your aspiration. Moreover, it took programming language researchers a while to understand the static properties of advanced runtime behaviour. In any case, it is about time to retire this outdated bifurcation of programming languages.

So, what is better? A static or a dynamic language? It is certainly best —but also a lot of work— to have a language with both a strong static and a strong dynamic semantics. Looking at the development of Swift, it surely shapes up to be strong in both areas.

Update: Added dynamic method dispatch to the list of examples of features of the dynamic semantics as a response to this discussion:

Categories: Offsite Blogs

Douglas M. Auclair (geophf): July 2016 1HaskellADay Problems and Solutions

Planet Haskell - Sun, 07/31/2016 - 10:53pm
July 2016
Categories: Offsite Blogs

The team: Intero for Emacs: Changes June–July

Planet Haskell - Sun, 07/31/2016 - 6:00pm

Intero was made public in the start of June. Here's a rundown of the changes made since then:

  • Now when the backend fails to start, it stops retrying when you're working until you kill the buffer.
  • When the backend is starting and it fails due to missing dependencies, it automatically re-runs without passing --no-build to stack; leading to build all the dependencies and then starting. This leads to a nice workflow of adding a package to the .cabal file and hitting M-x intero-restart.
  • Auto-completion of imports and pragmas.
  • Company-mode integration is asynchronous now, so it doesn't lock up the editor.
  • Removed hlint from next-checkers as it was bothering people. It's easy to re-enable with standard flycheck settings.
  • Now you can switch targets (e.g. M-x intero-targets) using the multi-switch view, like this. Saves you having to remember your targets and the syntax for specifying them.
  • You can now launch the REPL with C-u prefix so that it pops up an options list on how to start the REPL.
  • Fixed a bug in the warnings parser.
  • Added intero-toggle-debug (#79, #151), good for debugging issues with Intero.
  • Finally made a reliable way to save the current buffer for flycheck. This no longer interacts badly with magit or external changes to your files.
  • Added C-c C-z to switch to and from the REPL.
  • Added a suggestions system. When you hit C-c C-r, you get a list of suggestions that you can check and then apply with C-c C-c:

    • Automatically add extensions when GHC suggests them. Example:

      Can't make a derived instance of ‘Functor X’: You need DeriveFunctor to derive an instance for this class Try GeneralizedNewtypeDeriving for GHC's newtype-deriving extension In the newtype declaration for ‘X’
    • Automatically remove redundant imports. Example:

      The import of ‘Control.Monad’ is redundant except perhaps to import instances from ‘Control.Monad’ To import instances alone, use: import Control.Monad()... (intero)
    • Fix typos. Example:

      Not in scope: ‘putStrn’ Perhaps you meant one of these: ‘putStr’ (imported from Prelude), ‘putStrLn’ (imported from Prelude)
    • Adding top-level type signatures. Example:

      Top-level binding with no type signature: main :: IO ()
    • Removing redundant class constraints. Example:

      Redundant constraints: (Arith var, Bitwise var)
    • And turning off warnings for name shadowing and type defaulting. (Checkbox is not checked by default.)
  • And other miscellaneous bug fixes.
Categories: Offsite Blogs

FP Complete: Announce: public Jenkins CI server

Planet Haskell - Sun, 07/31/2016 - 6:00pm

We have set up a new public Jenkins CI server for use with our open source projects. This server currently runs the Stack integration tests, and deploys to and every time a commit is pushed to the master branch of their respective repositories.

In the future, we also intend to set up Jenkins to run the Stack integration tests on all supported platforms (rather than only Linux) using additional Jenkins workers, and get it to run them for pull requests as well.

While we use Travis CI, there are a couple of ways it does not meet our needs that Jenkins helps us with:

  • For long builds, we hit the 50 minute job timeout. While a standard series of tests should not exceed this amount of time, we also want to run more exhaustive integration tests which sometimes take longer. We can let Travis run the standard tests on PRs, and then periodically run the more extensive tests on Jenkins.

  • Some projects need to build Docker images. While Travis does support this, it means enabling the "standard" (non container-based) environment for jobs, which in turn does not support caching builds for public projects. For Haskell projects in particular, working without a cache means very long build times.

  • Projects also need to push Docker images to a registry and deploy them to a Kubernetes cluster. This requires exposing credentials to builds, which is impossible to secure when building code that uses TemplateHaskell, which allows running arbitrary code during the build.

For these projects, we continue to use Travis for quick feedback on PRs, but let Jenkins take care of the integration tests and deployments where we need more control over resource limitations and isolation of different build phases.

We run all the builds and tests on ephemeral, isolated Jenkins workers using the Docker plugin. These workers do not have access to any credentials, so there is no risk of credentials "leaking" into build logs or otherwise being accessed inappropriately.

For projects that need auto-deployment, the isolated build job stages the assets to be deployed, and then a separate deploy job is triggered if the build is successful. The deploy job runs on the Jenkins master which has access to required credentials, but it does not check out the project's source code from Github or run anything developer-provided. It only copies the built artifacts from the upstream job, builds and pushes a Docker image, and then updates a Kubernetes Deployment with the new image. Our public Jenkins server does not ever see any credentials for proprietary repos or mission-critical infrastructure, so even if security is breached it will have no effect beyond the CI system itself.

For production deployments of open source applications, we have a separate private Jenkins server that builds from the prod branch of the Git repositories, and deploys to a separate cluster. We ensure that the prod branch is protected so that only project administrators can trigger a production deployment.

We avoid using too many Jenkins-specific features. Essentially, we use Jenkins to perform triggering, notification and provide the build environment, but don't use Jenkins plugins to build Docker images or perform deployments. The Jenkins Docker plugin could commit an image after building and testing the code, but then we would have large images containing the whole build environment rather than minimal images containing only the application to deploy. We prefer instead to leave it to our own custom tooling that we can tailor to our needs and which can be run in many different environments so that we are not locked into Jenkins. A developer with access to the right credentials could, if necessary, perform the process easily from their own workstation by running the same build and deploy scripts as the Jenkins jobs run.

The Jenkins servers themselves run on EC2, with all cloud infrastructure managed using Hashicorp's Terraform, and the instances managed using Red Hat's Ansible. The Kubernetes cluster is set up using CoreOS's kube-aws tool.

Categories: Offsite Blogs

Mark Jason Dominus: Decomposing a function into its even and odd parts

Planet Haskell - Fri, 07/29/2016 - 8:41pm

As I have mentioned before, I am not a sudden-flash-of-insight person. Every once in a while it happens, but usually my thinking style is to minutely examine a large mass of examples and then gradually synthesize some conclusion about them. I am a penetrating but slow thinker. But there have been a few occasions in my life when the solution to a problem struck me suddenly out of the blue.

One such occasion was on the first day of my sophomore honors physics class in 1987. This was one of the best classes I took in my college career. It was given by Professor Stephen Nettel, and it was about resonance phenomena. I love when a course has a single overarching theme and proceeds to examine it in detail; that is all too rare. I deeply regret leaving my copy of the course notes in a restaurant in 1995.

The course was very difficult, But also very satisfying. It was also somewhat hair-raising, because of Professor Nettel's habit of saying, all through the second half “Don't worry if it doesn't seem to make any sense, it will all come together for you during the final exam.” This was not reassuring. But he was right! It did all come together during the final exam.

The exam had two sets of problems. The problems on the left side of the exam paper concerned some mechanical system, I think a rod fixed at one end and free at the other, or something like that. This set of problems asked us to calculate the resonant frequency of the rod, its rate of damping at various driving frequencies, and related matters. The right-hand problems were about an electrical system involving a resistor, capacitor, and inductor. The questions were the same, and the answers were formally identical, differing only in the details: on the left, the answers involved length, mass and stiffness of the rod, and on the right, the resistance, capacitance, and inductance of the electrical components. It was a brilliant exam, and I have never learned so much about a subject during the final exam.

Anyway, I digress. After the first class, we were assigned homework. One of the problems was

Show that every function is the sum of an even function and an odd function.

(Maybe I should explain that an even function is one which is symmetric across the -axis; formally it is a function for which for every . For example, the function , shown below left. An odd function is one which is symmetric under a half-turn about the origin; formally it satisfies for all . For example , shown below right.)


I found this claim very surprising, and we had no idea how to solve it. Well, not quite no idea: I knew that functions could be expanded in Fourier series, as the sum of a sine series and a cosine series, and the sine part was odd while the cosine part was even. But this seemed like a bigger hammer than was required, particularly since new sophomores were not expected to know about Fourier series.

I had the privilege to be in that class with Ron Buckmire, and I remember we stood outside the class building in the autumn sunshine and discussed the problem. I might have been thinking that perhaps there was some way to replace the negative part of with a reflected copy of the positive part to make an even function, and maybe that was always even, when I was hit from the blue with the solution:

$$ \begin{align} f_e(x) & = \frac{f(x) + f(-x)}2 \text{ is even},\\ f_o(x) & = \frac{f(x) - f(-x)}2 \text{ is odd, and}\\ f(x) &= f_e(x) + f_o(x) \end{align} $$

So that was that problem solved. I don't remember the other three problems in that day's homework, but I have remembered that one ever since.

But for some reason, it didn't occur to me until today to think about what those functions actually looked like. Of course, if itself is even, then and , and similarly if is odd. But most functions are neither even nor odd.

For example, consider the function , which is neither even nor odd. Then we get

$$ \begin{align} f_e(x) & = \frac{2^x + 2^{-x}}2\\ f_o(x) & = \frac{2^x - 2^{-x}}2 \end{align} $$

The graph is below left. The solid red line is , and the blue and purple dotted lines are and . The red line is the sum of the blue and purple lines. I thought this was very interesting-looking, but a little later I realized that I had already known what these graphs would look like, because is just like , and for the even and odd components are exactly the familiar and functions. (Below left, ; below right, .)

I wasn't expecting polynomials to be more interesting, but they were. (Polynomials whose terms are all odd powers of , such as , are always odd functions, and similarly polynomials whose terms are all even powers of are even functions.) For example, consider , which is neither even nor odd. We don't even need the and formulas to separate this into even and odd parts: just expand as and separate it into odd and even powers, and :

Or we could do similarly, expanding it as and separating this into and :

I love looking at these and seeing how the even blue line and the odd purple line conspire together to make whatever red line I want.

I kept wanting to try familiar simple functions, like , but many of these are either even or odd, and so are uninteresting for this application. But you can make an even or an odd function into a neither-even-nor-odd function just by translating it horizontally, which you do by replacing with . So the next function I tried was , which is the translation of . Here I got a surprise. I knew that was undefined at , so I graphed it only for . But the even component is , which is undefined at both and at . Similarly the odd component is undefined at two points. So the formula does not work quite correctly, failing to produce the correct value at , even though is defined there. In general, if is undefined at some , then the decomposition into even and odd components fails at as well. The limit $$\lim_{x\to -c} f(x) = \lim_{x\to -c} \left(f_o(x) + f_e(x)\right)$$ does hold, however. The graph below shows the decomposition of .

Vertical translations are uninteresting: they leave unchanged and translate by the same amount, as you can verify algebraically or just by thinking about it.

Following the same strategy I tried a cosine wave. The evenness of the cosine function is one of its principal properties, so I translated it and used . The graph below is actually for to prevent the details from being too compressed:

This reminded me of the time I was fourteen and graphed and was surprised to see that it was another perfect sinusoid. But I realized that there was a simple way to understand this. I already knew that . If you take and multiply the whole thing by , you get $$\sqrt2\cos\left(x + \frac\pi4\right) = \sqrt2\sin x\cos\frac\pi4 + \sqrt2\cos x\sin\frac\pi4 = \sin x + \cos x$$ so that is just a shifted, scaled cosine curve. The decomposition of is even simpler because you can work forward instead of backward and find that , and the first term is odd while the second term is even, so that decomposes as a sum of an even and an odd sinusoid as you see in the graph above.

Finally, I tried a Poisson distribution, which is highly asymmetric. The formula for the Poisson distribution is , for some constant . The in the denominator is only defined for non-negative integer , but you can extend it to fractional and negative in the usual way by using instead, where is the Gamma function. The function is undefined at zero and negative integers, but fortunately what we need here is the reciprocal gamma function , which is perfectly well-behaved. The results are spectacular. The graph below has .

The part of this with is the most interesting to me, because the Poisson distribution has a very distinctive shape, and once again I like seeing the blue and purple functions working together to make it. I think it's just great how the red line goes gently to zero as increases, even though the even and the odd components are going wild. ( increases rapidly with , so the reciprocal function goes rapidly to zero. But the even and odd components also have a part, and this is what dominates the blue and purple lines when .)

On the side it has no meaning for me, and it's just wiggly lines. It hadn't occurred to me before that you could extend the Poisson distribution function to negative , and I still can't imagine what it could mean, but I suppose why not. Probably some statistician could explain to me what the Poisson distribution is about when .

You can also consider the function , which breaks down completely, because either or is undefined except when . So the claim that every function is the sum of an even and an odd function fails here too. Except perhaps not! You could probably consider the extension of the square root function to the complex plane, and take one of its branches, and I suppose it works out just fine. The geometric interpretation of evenness and oddness are very different, of course, and you can't really draw the graphs unless you have four-dimensional vision.

I have no particular point to make, except maybe that math is fun, even elementary math (or perhaps especially elementary math) and it's fun to see how it works out.

The beautiful graphs in this article were made with Desmos. I had dreaded having to illustrate my article with graphs from Gnuplot (ugh) or Wolfram|α (double ugh) and was thrilled to find such a handsome alternative.

[ Addendum: I've just discovered that in Desmos you can include a parameter in the functions that it graphs, and attach the parameter to a slider. So for example you can arrange to have it display or , with the value of controlled by the slider, and have the graph move left and right on the plane as you adjust the slider, with its even and odd parts changing in real time to match. ]

[ For example, check out travelling Gaussians or varying sinusoid. ]

Categories: Offsite Blogs

Functional Jobs: Elm Developer at Takt (Full-time)

Planet Haskell - Fri, 07/29/2016 - 2:51pm

Takt is seeking a front-end developer excited about Elm to help develop our flagship product. We just closed a $30 million Series A, and we're already reaching more than 10 million users at Starbucks, making us one of the largest ventures built on Haskell + Elm.

Our platform processes giant event streams of all kinds, identifying patterns, trends and opportunities to intervene and improve processes, aided by machine learning. Our vision will change the way people engage across multiple industries, be it retail, finance, or healthcare.

As a Takt engineer, you'll work in small, self-sufficient teams with the shared goal of delivering excellent software anchored in an agile culture of quality, delivery, and innovation. You understand that legacy code is the work you did yesterday. You also share our passion for functional programming and using data to solve complex problems.


  • Use functional programming languages (Elm!) to build applications and Front-Ends
  • Work on complex design challenges, understanding customer needs and crafting simple, beautiful solutions
  • Expose complex application functionality in straightforward and elegant ways
  • Develop functional and beautiful visualizations of complex data
  • Deliver working software in short sprints
  • Help grow our engineering team


  • Strong, demonstrated experience developing software using functional Javascript (Elm, PureScript, Clojure.)
  • Significant experience with dynamic, interactive data visualization (e.g. D3)
  • Demonstrated experience building sophisticated and complex applications, such as workflow management tools
  • Proven experience in unit testing front-end applications


  • Personal projects or production experience with Elm
  • You welcome the responsibility and thrill that comes with being a member of a founding team
  • You're motivated, dependable, and continuously focused on excellence


Takt distills complex data into precise actions; we orchestrate physical and digital exchanges into one seamless journey. Our business is building lasting, trusted relationships between people and brands—and making it look easy.

We're already reaching millions of people a day, and we're just getting started. Our founding leadership is equal parts business, design, and engineering—because we believe differing perspectives + passionate discourse achieve the greatest outcomes. We are collectively talented, but also humble. We give our whole selves. We love learning new things.

We are an equal-opportunity employer, and strive to make hiring decisions that reflect that. If you're up for the challenge of a lifetime, we're looking for outstanding talent to join our team.

Get information on how to apply for this position.

Categories: Offsite Blogs

Functional Jobs: Haskell Engineer at Takt (Full-time)

Planet Haskell - Fri, 07/29/2016 - 2:51pm

Takt is seeking a Haskell engineer to help develop our flagship product. We just closed a $30 million Series A, and we're already reaching more than 10 million users at Starbucks, making us one of the largest ventures built on Haskell.

Our platform processes giant event streams of all kinds, identifying patterns, trends and opportunities to intervene and improve processes, aided by machine learning. Our vision will change the way people engage across multiple industries, be it retail, finance, or healthcare.

As a Takt engineer, you'll work in small, self-sufficient teams with the shared goal of delivering excellent software anchored in an agile culture of quality, delivery, and innovation. You understand that legacy code is the work you did yesterday. You also share our passion for functional programming and using data to solve complex problems.


  • Write tested, high-performance, maintainable code in Haskell
  • Deliver working software in short sprints
  • Help invent novel solutions for ridiculously hard problems
  • Bring a high level of innovation
  • Help grow our engineering team


  • Significant experience using functional languages (Haskell, Scala, Erlang, Clojure, etc.) to build production systems or complex personal projects, or to make major OSS contributions
  • Real-world experience designing, developing, testing, and deploying systems based on SOA or micro-services
  • Skilled at designing and implementing SQL and NoSQL data persistence stores and caches
  • Experience building REST API services using functional languages and design principles
  • Ability to build infrastructure for real-time analytics and real-time predictive intelligence based on large, diverse, and dynamic data sets


  • You have hardware hacking and pro-typing experience
  • You welcome the responsibility and thrill that comes with being a member of a founding team
  • You're motivated, dependable, and continuously focused on excellence


Takt distills complex data into precise actions; we orchestrate physical and digital exchanges into one seamless journey. Our business is building lasting, trusted relationships between people and brands—and making it look easy.

We're already reaching millions of people a day, and we're just getting started. Our founding leadership is equal parts business, design, and engineering—because we believe differing perspectives + passionate discourse achieve the greatest outcomes. We are collectively talented, but also humble. We give our whole selves. We love learning new things.

We are an equal-opportunity employer, and strive to make hiring decisions that reflect that. If you're up for the challenge of a lifetime, we're looking for outstanding talent to join our team.

Get information on how to apply for this position.

Categories: Offsite Blogs

Mark Jason Dominus: Controlling the KDE screen locking works now

Planet Haskell - Thu, 07/28/2016 - 1:10pm

Yesterday I wrote about how I was trying to control the KDE screenlocker's timeout from a shell script and all the fun stuff I learned along the way. Then after I published the article I discovered that my solution didn't work. But today I fixed it and it does work.

What didn't work

I had written this script:

timeout=${1:-3600} perl -i -lpe 's/^Enabled=.*/Enabled=False/' $HOME/.kde/share/config/kscreensaverrc qdbus org.freedesktop.ScreenSaver /MainApplication reparseConfiguration sleep $timeout perl -i -lpe 's/^Enabled=.*/Enabled=True/' $HOME/.kde/share/config/kscreensaverrc qdbus org.freedesktop.ScreenSaver /MainApplication reparseConfiguration

The strategy was: use perl to rewrite the screen locker's configuration file, and then use qdbus to send a D-Bus message to the screen locker to order it to load the updated configuration.

This didn't work. The System Settings app would see the changed configuration, and report what I expected, but the screen saver itself was still behaving according to the old configuration. Maybe the qdbus command was wrong or maybe the whole theory was bad.

More strace

For want of anything else to do (when all you have is a hammer…), I went back to using strace to see what else I could dig up, and tried

strace -ff -o /tmp/ss/s /usr/bin/systemsettings

which tells strace to write separate files for each process or thread. I had a fantasy that by splitting the trace for each process into a separate file, I might solve the mysterious problem of the missing string data. This didn't come true, unfortunately.

I then ran tail -f on each of the output files, and used systemsettings to update the screen locker configuration, looking to see which the of the trace files changed. I didn't get too much out of this. A great deal of the trace was concerned with X protocol traffic between the application and the display server. But I did notice this portion, which I found extremely suggestive, even with the filenames missing:

3106 open(0x2bb57a8, O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 18 3106 fcntl(18, F_SETFD, FD_CLOEXEC) = 0 3106 chmod(0x2bb57a8, 0600) = 0 3106 fstat(18, {...}) = 0 3106 write(18, 0x2bb5838, 178) = 178 3106 fstat(18, {...}) = 0 3106 close(18) = 0 3106 rename(0x2bb5578, 0x2bb4e48) = 0 3106 unlink(0x2b82848) = 0

You may recall that my theory was that when I click the “Apply” button in System Settings, it writes out a new version of $HOME/.kde/share/config/kscreensaverrc and then orders the screen locker to reload the configuration. Even with no filenames, this part of the trace looked to me like the replacement of the configuration file: a new file is created, then written, then closed, and then the rename replaces the old file with the new one. If I had been thinking about it a little harder, I might have thought to check if the return value of the write call, 178 bytes, matched the length of the file. (It does.) The unlink at the end is deleting the semaphore file that System Settings created to prevent a second process from trying to update the same file at the same time.

Supposing that this was the trace of the configuration update, the next section should be the secret sauce that tells the screen locker to look at the new configuration file. It looked like this:

3106 sendmsg(5, 0x7ffcf37e53b0, MSG_NOSIGNAL) = 168 3106 poll([?] 0x7ffcf37e5490, 1, 25000) = 1 3106 recvmsg(5, 0x7ffcf37e5390, MSG_CMSG_CLOEXEC) = 90 3106 recvmsg(5, 0x7ffcf37e5390, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) 3106 sendmsg(5, 0x7ffcf37e5770, MSG_NOSIGNAL) = 278 3106 sendmsg(5, 0x7ffcf37e5740, MSG_NOSIGNAL) = 128

There is very little to go on here, but none of it is inconsistent with the theory that this is the secret sauce, or even with the more advanced theory that it is the secret suace and that the secret sauce is a D-Bus request. But without seeing the contents of the messages, I seemed to be at a dead end.


Browsing random pages about the KDE screen locker, I learned that the lock screen configuration component could be run separately from the rest of System Settings. You use

kcmshell4 --list

to get a list of available components, and then

kcmshell4 screensaver

to run the screensaver component. I started running strace on this command instead of on the entire System Settings app, with the idea that if nothing else, the trace would be smaller and perhaps simpler, and for some reason the missing strings appeared. That suggestive block of code above turned out to be updating the configuration file, just as I had suspected:

open("/home/mjd/.kde/share/config/", O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 19 fcntl(19, F_SETFD, FD_CLOEXEC) = 0 chmod("/home/mjd/.kde/share/config/", 0600) = 0 fstat(19, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 write(19, "[ScreenSaver]\nActionBottomLeft=0\nActionBottomRight=0\nActionTopLeft=0\nActionTopRight=2\nEnabled=true\nLegacySaverEnabled=false\nPlasmaEnabled=false\nSaver=krandom.desktop\nTimeout=60\n", 177) = 177 fstat(19, {st_mode=S_IFREG|0600, st_size=177, ...}) = 0 close(19) = 0 rename("/home/mjd/.kde/share/config/", "/home/mjd/.kde/share/config/kscreensaverrc") = 0 unlink("/home/mjd/.kde/share/config/kscreensaverrc.lock") = 0

And the following secret sauce was revealed as:

sendmsg(7, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\0\1\30\0\0\0\v\0\0\0\177\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\3\1s\0\f\0\0\0GetNameOwner\0\0\0\0\10\1g\0\1s\0\0", 144}, {"\23\0\0\0org.kde.screensaver\0", 24}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 168 sendmsg(7, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\1\1\206\0\0\0\f\0\0\0\177\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\3\1s\0\10\0\0\0AddMatch\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", 144}, {"\201\0\0\0type='signal',sender='org.freedesktop.DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.kde.screensaver'\0", 134}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 278 sendmsg(7, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\0\1\0\0\0\0\r\0\0\0j\0\0\0\1\1o\0\f\0\0\0/ScreenSaver\0\0\0\0\6\1s\0\23\0\0\0org.kde.screensaver\0\0\0\0\0\2\1s\0\23\0\0\0org.kde.screensaver\0\0\0\0\0\3\1s\0\t\0\0\0configure\0\0\0\0\0\0\0", 128}, {"", 0}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 128 sendmsg(7, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\1\1\206\0\0\0\16\0\0\0\177\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\3\1s\0\v\0\0\0RemoveMatch\0\0\0\0\0\10\1g\0\1s\0\0", 144}, {"\201\0\0\0type='signal',sender='org.freedesktop.DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.kde.screensaver'\0", 134}]

(I had to tell give strace the -s 256 flag to tell it not to truncate the string data to 32 characters.)

Binary gibberish

A lot of this is illegible, but it is clear, from the frequent mentions of DBus, and from the names of D-Bus objects and methods, that this is is D-Bus requests, as theorized. Much of it is binary gibberish that we can only read if we understand the D-Bus line protocol, but the object and method names are visible. For example, consider this long string:


With qdbus I could confirm that there was a service named org.freedesktop.DBus with an object named / that supported a NameOwnerChanged method which expected three QString arguments. Presumably the first of these was org.kde.screensaver and the others are hiding in other the 134 characters that strace didn't expand. So I may not understand the whole thing, but I could see that I was on the right track.

That third line was the key:

sendmsg(7, {msg_name(0)=NULL, msg_iov(2)=[{"… /ScreenSaver … org.kde.screensaver … org.kde.screensaver … configure …", 128}, {"", 0}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 128

Huh, it seems to be asking the screensaver to configure itself. Just like I thought it should. But there was no configure method, so what does that configure refer to, and how can I do the same thing?

But org.kde.screensaver was not quite the same path I had been using to talk to the screen locker—I had been using org.freedesktop.ScreenSaver, so I had qdbus list the methods at this new path, and there was a configure method.

When I tested

qdbus org.kde.screensaver /ScreenSaver configure

I found that this made the screen locker take note of the updated configuration. So, problem solved!

(As far as I can tell, org.kde.screensaver and org.freedesktop.ScreenSaver are completely identical. They each have a configure method, but I had overlooked it—several times in a row—earlier when I had gone over the method catalog for org.freedesktop.ScreenSaver.)

The working script is almost identical to what I had yesterday:

timeout=${1:-3600} perl -i -lpe 's/^Enabled=.*/Enabled=False/' $HOME/.kde/share/config/kscreensaverrc qdbus org.freedesktop.ScreenSaver /ScreenSaver configure sleep $timeout perl -i -lpe 's/^Enabled=.*/Enabled=True/' $HOME/.kde/share/config/kscreensaverrc qdbus org.freedesktop.ScreenSaver /ScreenSaver configure

That's not a bad way to fail, as failures go: I had a correct idea about what was going on, my plan about how to solve my problem would have worked, but I was tripped up by a trivium; I was calling MainApplication.reparseConfiguration when I should have been calling ScreenSaver.configure.

What if I hadn't been able to get strace to disgorge the internals of the D-Bus messages? I think I would have gotten the answer anyway. One way to have gotten there would have been to notice the configure method documented in the method catalog printed out by qdbus. I certainly looked at these catalogs enough times, and they are not very large. I don't know why I never noticed it on my own. But I might also have had the idea of spying on the network traffic through the D-Bus socket, which is under /tmp somewhere.

I was also starting to tinker with dbus-send, which is like qdbus but more powerful, and can post signals, which I think qdbus can't do, and with gdbus, another D-Bus introspector. I would have kept getting more familiar with these tools and this would have led somewhere useful.

Or had I taken just a little longer to solve this, I would have followed up on Sumana Harihareswara’s suggestion to look at Bustle, which is a utility that logs and traces D-Bus requests. It would certainly have solved my problem, because it makes perfectly clear that clicking that apply button invoked the configure method:

I still wish I knew why strace hadn't been able to print out those strings through.

Categories: Offsite Blogs

Mark Jason Dominus: Controlling KDE screen locking from a shell script

Planet Haskell - Wed, 07/27/2016 - 4:59pm

Lately I've started watching stuff on Netflix. Every time I do this, the screen locker kicks in sixty seconds in, and I have to unlock it, pause the video, and adjust the system settings to turn off the automatic screen locker. I can live with this.

But when the show is over, I often forget to re-enable the automatic screen locker, and that I can't live with. So I wanted to write a shell script:

#!/bin/sh auto-screen-locker disable sleep 3600 auto-screen-locker enable

Then I'll run the script in the background before I start watching, or at least after the first time I unlock the screen, and if I forget to re-enable the automatic locker, the script will do it for me.

The question is: how to write auto-screen-locker?


My first idea was: maybe there is actually an auto-screen-locker command, or a system-settings command, or something like that, which was being run by the System Settings app when I adjusted the screen locker from System Settings, and all I needed to do was to find out what that command was and to run it myself.

So I tried running System Settings under strace -f and then looking at the trace to see if it was execing anything suggestive.

It wasn't, and the trace was 93,000 lines long and frighting. Halfway through, it stopped recording filenames and started recording their string addresses instead, which meant I could see a lot of calls to execve but not what was being execed. I got sidetracked trying to understand why this had happened, and I never did figure it out—something to do with a call to clone, which is like fork, but different in a way I might understand once I read the man page.

The first thing the cloned process did was to call set_robust_list, which I had never heard of, and when I looked for its man page I found to my surprise that there was one. It begins:

NAME get_robust_list, set_robust_list - get/set list of robust futexes

And then I felt like an ass because, of course, everyone knows all about the robust futex list, duh, how silly of me to have forgotten ha ha just kidding WTF is a futex? Are the robust kind better than regular wimpy futexes?

It turns out that Ingo Molnár wrote a lovely explanation of robust futexes which are actually very interesting. In all seriousness, do check it out.

I seem to have digressed. This whole section can be summarized in one sentence:

strace was no help and took me a long way down a wacky rabbit hole.

Sorry, Julia!

Stack Exchange

The next thing I tried was Google search for kde screen locker. The second or third link I followed was to this StackExchange question, “What is the screen locking mechanism under KDE? It wasn't exactly what I was looking for but it was suggestive and pointed me in the right direction. The crucial point in the answer was a mention of

qdbus org.freedesktop.ScreenSaver /ScreenSaver Lock

When I saw this, it was like a new section of my brain coming on line. So many things that had been obscure suddenly became clear. Things I had wondered for years. Things like “What are these horrible

Object::connect: No such signal org::freedesktop::UPower::DeviceAdded(QDBusObjectPath)

messages that KDE apps are always spewing into my terminal?” But now the light was on.

KDE is built atop a toolkit called Qt, and Qt provides an interprocess communication mechanism called “D-Bus”. The qdbus command, which I had not seen before, is apparently for sending queries and commands on the D-Bus. The arguments identify the recipient and the message you are sending. If you know the secret name of the correct demon, and you send it the correct secret command, it will do your bidding. ( The mystery message above probably has something to do with the app using an invalid secret name as a D-Bus address.)

Often these sorts of address hierarchies work well in theory and then fail utterly because there is no way to learn the secret names. The X Window System has always had a feature called “resources” by which almost every aspect of every application can be individually customized. If you are running xweasel and want just the frame of just the error panel of just the output window to be teal blue, you can do that… if you can find out the secret names of the xweasel program, its output window, its error panel, and its frame. Then you combine these into a secret X resource name, incant a certain command to load the new resource setting into the X server, and the next time you run xweasel the one frame, and only the one frame, will be blue.

In theory these secret names are documented somewhere, maybe. In practice, they are not documented anywhere. you can only extract them from the source, and not only from the source of xweasel itself but from the source of the entire widget toolkit that xweasel is linked with. Good luck, sucker.

D-Bus has a directory

However! The authors of Qt did not forget to include a directory mechanism in D-Bus. If you run


you get a list of all the addressable services, which you can grep for suggestive items, including org.freedesktop.ScreenSaver. Then if you run

qdbus org.freedesktop.ScreenSaver

you get a list of all the objects provided by the org.freedesktop.ScreenSaver service; there are only seven. So you pick a likely-seeming one, say /ScreenSaver, and run

qdbus org.freedesktop.ScreenSaver /ScreenSaver

and get a list of all the methods that can be called on this object, and their argument types and return value types. And you see for example

method void org.freedesktop.ScreenSaver.Lock()

and say “I wonder if that will lock the screen when I invoke it?” And then you try it:

qdbus org.freedesktop.ScreenSaver /ScreenSaver Lock

and it does.

That was the most important thing I learned today, that I can go wandering around in the qdbus hierarchy looking for treasure. I don't yet know exactly what I'll find, but I bet there's a lot of good stuff.

When I was first learning Unix I used to wander around in the filesystem looking at all the files, and I learned a lot that way also.

  • “Hey, look at all the stuff in /etc! Huh, I wonder what's in /etc/passwd?”

  • “Hey, /etc/protocols has a catalog of protocol numbers. I wonder what that's for?”

  • “Hey, there are a bunch of files in /usr/spool/mail named after users and the one with my name has my mail in it!”

  • “Hey, the manuals are all under /usr/man. I could grep them!”

Later I learned (by browsing in /usr/man/man7) that there was a hier(7) man page that listed points of interest, including some I had overlooked.

The right secret names

Everything after this point was pure fun of the “what happens if I turn this knob” variety. I tinkered around with the /ScreenSaver methods a bit (there are twenty) but none of them seemed to be quite what I wanted. There is a

method uint Inhibit(QString application_name, QString reason_for_inhibit)

method which someone should be calling, because that's evidently what you call if you are a program playing a video and you want to inhibit the screen locker. But the unknown someone was delinquent and it wasn't what I needed for this problem.

Then I moved on to the /MainApplication object and found

method void org.kde.KApplication.reparseConfiguration()

which wasn't quite what I was looking for either, but it might do: I could perhaps modify the configuration and then invoke this method. I dimly remembered that KDE keeps configuration files under $HOME/.kde, so I ls -la-ed that and quickly found share/config/kscreensaverrc, which looked plausible from the outside, and more plausible when I saw what was in it:

Enabled=True Timeout=60

among other things. I hand-edited the file to change the 60 to 243, ran

qdbus org.freedesktop.ScreenSaver /MainApplication reparseConfiguration

and then opened up the System Settings app. Sure enough, the System Settings app now reported that the lock timeout setting was “4 minutes”. And changing Enabled=True to Enabled=False and back made the System Settings app report that the locker was enabled or disabled.

The answer

So the script I wanted turned out to be:

timeout=${1:-3600} perl -i -lpe 's/^Enabled=.*/Enabled=False/' $HOME/.kde/share/config/kscreensaverrc qdbus org.freedesktop.ScreenSaver /MainApplication reparseConfiguration sleep $timeout perl -i -lpe 's/^Enabled=.*/Enabled=True/' $HOME/.kde/share/config/kscreensaverrc qdbus org.freedesktop.ScreenSaver /MainApplication reparseConfiguration

Problem solved, but as so often happens, the journey was more important than the destination.

I am greatly looking forward to exploring the D-Bus hierarchy and sending all sorts of inappropriate messages to the wrong objects.

Just before he gets his ass kicked by Saruman, that insufferable know-it-all Gandalf says “He who breaks a thing to find out what it is has left the path of wisdom.” If I had been Saruman, I would have kicked his ass at that point too.


Right after I posted this, I started watching Netflix. The screen locker cut in after sixty seconds. “Aha!” I said. “I'll run my new script!”

I did, and went back to watching. Sixty seconds later, the screen locker cut in again. My script doesn't work! The System Settings app says the locker has been disabled, but it's mistaken. Probably it's only reporting the contents of the configuration file that I edited, and the secret sauce is still missing. The System Settings app does something to update the state of the locker when I click that “Apply” button, and I thought that my qdbus command was doing the same thing, but it seems that it isn't.

I'll figure this out, but maybe not today. Good night all!

[ Addendum 20160728: I figured it out the next day ]

[ Addendum 20160729: It has come to my attention that there is actually a program called xweasel. ]

Categories: Offsite Blogs

Managed Languages & Runtimes Week '16 - Call forParticipation

General haskell list - Wed, 07/27/2016 - 9:57am
Managed Languages & Runtimes Week '16 PPPJ '16 / JTRES '16 / VMM '16 August 29 - September 2, 2016 Lugano, Switzerland ------------------------------------------------------------------------------- Managed Languages & Runtimes Week '16 is a premier forum for presenting and discussing innovations and breakthroughs in the area of programming languages and runtime systems, which form the basis of many modern computing systems, from small scale (embedded and real-time systems) to large-scale (cloud-computing and big-data platforms). Managed Languages & Runtimes Week '16 features three international academic and industry venues for the first time: - PPPJ '16 - 13th International Conference on Principles and Practices of Programming on the Java Platform: virtual machines, languages, and tools - A forum for researchers, practitioners, and educators to present and discuss novel results on all aspects of managed languages and their runtime systems, including virtual ma
Categories: Incoming News

Fully Abstract Compilation via Universal Embedding

Lambda the Ultimate - Wed, 07/27/2016 - 9:57am

Fully Abstract Compilation via Universal Embedding by Max S. New, William J. Bowman, and Amal Ahmed:

A fully abstract compiler guarantees that two source components are observationally equivalent in the source language if and only if their translations are observationally equivalent in the target. Full abstraction implies the translation is secure: target-language attackers can make no more observations of a compiled component than a source-language attacker interacting with the original source component. Proving full abstraction for realistic compilers is challenging because realistic target languages contain features (such as control effects) unavailable in the source, while proofs of full abstraction require showing that every target context to which a compiled component may be linked can be back-translated to a behaviorally equivalent source context.

We prove the first full abstraction result for a translation whose target language contains exceptions, but the source does not. Our translation—specifically, closure conversion of simply typed λ-calculus with recursive types—uses types at the target level to ensure that a compiled component is never linked with attackers that have more distinguishing power than source-level attackers. We present a new back-translation technique based on a deep embedding of the target language into the source language at a dynamic type. Then boundaries are inserted that mediate terms between the untyped embedding and the strongly-typed source. This technique allows back-translating non-terminating programs, target features that are untypeable in the source, and well-bracketed effects.

Potentially a promising step forward to secure multilanguage runtimes. We've previously discussed security vulnerabilities caused by full abstraction failures here and here. The paper also provides a comprehensive review of associated literature, like various means of protection, back translations, embeddings, etc.

Categories: Offsite Discussion