News aggregator

Edward Z. Yang: cabal new-build is a package manager

Planet Haskell - Mon, 08/29/2016 - 3:32pm

An old article I occasionally see cited today is Repeat after me: "Cabal is not a Package Manager". Many of the complaints don't apply to cabal-install 1.24's new Nix-style local builds. Let's set the record straight.

Fact: cabal new-build doesn't handle non-Haskell dependencies

OK, so this is one thing that hasn't changed since Ivan's article. Unlike Stack, cabal new-build will not handle downloading and installing GHC for you, and like Stack, it won't download and install system libraries or compiler toolchains: you have to do that yourself. This is definitely a case where you should lean on your system package manager to bootstrap a working installation of Cabal and GHC.

Fact: The Cabal file format can record non-Haskell pkg-config dependencies

Since 2007, the Cabal file format has a pkgconfig-depends field which can be used to specify dependencies on libraries understood by the pkg-config tool. It won't install the non-Haskell dependency for you, but it can let you know early on if a library is not available.

In fact, cabal-install's dependency solver knows about the pkgconfig-depends field, and will pick versions and set flags so that we don't end up with a package with an unsatisfiable pkg-config dependency.

Fact: cabal new-build 2.0 handles build-tools dependencies

As of writing, this feature is unreleased (if you are impatient, get a copy of HEAD from the GitHub repository or install cabal-install-head from hvr's PPA). However, in cabal-install 2.0, build-tools dependencies will be transparently built and added to your PATH. Thus, if you want to install a package which has build-tools: happy, cabal new-build will automatically install happy and add it to the PATH when building this package. These executables are tracked by new-build and we will avoid rebuilding the executable if it is already present.

Since build-tools identify executable names, not packages, there is a set of hardcoded build-tools which are treated in this way, coinciding with the set of build-tools that simple Setup scripts know how to use natively. They are hscolour, haddock, happy, alex, hsc2hs, c2hs, cpphs and greencard.

Fact: cabal new-build can upgrade packages without breaking your database

Suppose you are working on some project which depends on a few dependencies. You decide to upgrade one of your dependencies by relaxing a version constraint in your project configuration. After making this change, all it takes is a cabal new-build to rebuild the relevant dependency and start using it. That's it! Even better, if you had an old project using the old dependency, well, it still is working, just as you would hope.

What is actually going on is that cabal new-build doesn't do anything like a traditional upgrade. Packages installed to cabal new-build's global store are uniquely identified by a Nix-style identifier which captures all of the information that may have affected the build, including the specific versions that were built against. Thus, a package "upgrade" actually is just the installation of a package under a different unique identifier which can coexist with the old one. You will never end up with a broken package database because you typed new-build.

There is not presently a mechanism for removing packages besides deleting your store (.cabal/store), but it is worth noting that deleting your store is a completely safe operation: cabal new-build won't decide that it wants to build your package differently if the store doesn't exist; the store is purely a cache and does not influence the dependency solving process.

Fact: Hackage trustees, in addition to package authors, can edit Cabal files for published packages to fix bugs

If a package is uploaded with bad version bounds and a subsequent new release breaks them, a Hackage Trustee can intervene, making a modification to the Cabal file to update the version bounds in light of the new information. This is a more limited form of intervention than the patches of Linux distributions, but it is similar in nature.

Fact: If you can, use your system package manager

cabal new-build is great, but it's not for everyone. If you just need a working pandoc binary on your system and you don't care about having the latest and greatest, you should download and install it via your operating system's package manager. Distro packages are great for binaries; they're less good for libraries, which are often too old for developers (though it is often the easiest way to get a working install of OpenGL). cabal new-build is oriented at developers of Haskell packages, who need to build and depend on packages which are not distributed by the operating system.

I hope this post clears up some misconceptions!

Categories: Offsite Blogs

Functional Jobs: Senior Software Engineer (Haskell) at Front Row Education (Full-time)

Planet Haskell - Mon, 08/29/2016 - 11:59am
Position

Senior Software Engineer to join fast-growing education startup transforming the way 3+ million K-12 students learn Math and English.

What you tell your friends you do

“You know how teachers in public schools are always overworked and overstressed with 30 kids per classroom and never ending state tests? I make their lives possible and help their students make it pretty far in life”

What you really will be doing

Architect, design and develop new web applications, tools and distributed systems for the Front Row ecosystem in Haskell, Flow, PostgreSQL, Ansible and many others. You will get to work on your deliverable end-to-end, from the UX to the deployment logic

Mentor and support more junior developers in the organization

Create, improve and refine workflows and processes for delivering quality software on time and without incurring debt

Work closely with Front Row educators, product managers, customer support representatives and account executives to help the business move fast and efficiently through relentless automation.

How you will do this

You’re part of an agile, multidisciplinary team. You bring your own unique skill set to the table and collaborate with others to accomplish your team’s goals.

You prioritize your work with the team and its product owner, weighing both the business and technical value of each task.

You experiment, test, try, fail and learn all the time

You don’t do things just because they were always done that way, you bring your experience and expertise with you and help the team make the best decisions

What have we worked on in the last quarter

We have rewritten our business logic to be decoupled from the Common Core math standards, supporting US state-specific standards and international math systems

Prototyped and tested a High School Math MVP product in classrooms

Changed assigning Math and English to a work queue metaphor across all products for conceptual product simplicity and consistency

Implemented a Selenium QA test suite 100% in Haskell

Released multiple open source libraries for generating automated unit test fixtures, integrating with AWS, parsing and visualizing Postgres logs and much more

Made numerous performance optimization passes on the system for supporting classrooms with weak Internet bandwidth

Team

We’re an agile and lean small team of engineers, teachers and product people working on solving important problems in education. We hyper-focus on speeds, communication and prioritizing what matters to our millions of users.

Requirements
  • You’re smart and can find a way to show us.
  • A track record of 5+ years of working in, or leading, teams that rapidly ship high quality web-based software that provides great value to users. Having done this at a startup a plus.
  • Awesome at a Functional Programming language: Haskell / Scala / Clojure / Erlang etc
  • Exceptional emotional intelligence and people skills
  • Organized and meticulous, but still able to focus on the big picture of the product
  • A ton of startup hustle: we're a fast-growing, VC-backed, Silicon Valley tech company that works hard to achieve the greatest impact we can.
Benefits
  • Money, sweet
  • Medical, dental, vision
  • Incredible opportunity to grow, learn and build lifetime bonds with other passionate people who share your values
  • Food, catered lunch & dinner 4 days a week + snacks on snacks
  • Room for you to do things your way at our downtown San Francisco location right by the Powell Station BART, or you can work remotely from anywhere in the US, if that’s how you roll
  • Awesome monthly team events + smaller get-togethers (board game nights, trivia, etc)

Get information on how to apply for this position.

Categories: Offsite Blogs

Philip Wadler: Option A: Think about the children

Planet Haskell - Mon, 08/29/2016 - 10:47am
Fellow tweeter @DarlingSteveEDI captured my image (above) as we gathered for Ride the Route this morning, in support of Option A for the Edinburgh's proposed West-East Cycle Route (the route formerly known as Roseburn to Leith Walk). My own snap of the gathering is below.

Fellow blogger Eilidh Troup considers another aspect of the route, safety for schoolchildren. Option A is far safer than Option B for young children cycling to school: the only road crossing in Option A is guarded by a lollipop lady, while children taking Option B must cross *three* busy intersections unaided.

It's down to the wire: members of the Transport and Environment Committee vote tomorrow. The final decision may be closely balanced, so even sending your councillor (and councillors on the committee) a line or two can have a huge impact. If you haven't written, write now, right now!

Previously:
  Roseburn to Leith Walk A vs B: time to act!
  Ride the Route in support of Option A

Late breaking addendum:
  Sustrans supports Option A: It’s time for some big decisions…
 


Categories: Offsite Blogs

Philip Wadler: Roseburn to Leith Walk A vs B: time to act!

Planet Haskell - Mon, 08/29/2016 - 10:36am
On 2 August, I attended a meeting in Roseburn organised by those opposed to the new cycleway planned by the city. Local shopkeepers fear they will see a reduction in business, unaware this is a common cycling fallacy: study after study has shown that adding cycleways increases business, not the reverse, because pedestrians and cyclists find the area more attractive.

Feelings in Roseburn run strong. The locals don't trust the council: who can blame them after the fiasco over trams? But the leaders of the campaign are adept at cherry picking statistics, and, sadly, neither side was listening to the other.

On 30 August, the Edinburgh Council Transport and Environment Committee will decide between two options for the cycle route, A and B. Route A is direct. Route B goes round the houses, adding substantial time and rendering the whole route less attractive. If B is built, the opportunity to shift the area away from cars, to make it a more pleasant place to be and draw more business from those travelling by foot, bus, and cycle, goes out the window.

Locals like neither A nor B, but in a spirit of compromise the Transport and Environment Committee may opt for B. This will be a disaster, as route B will be far less likely to draw people out of their cars and onto their cycles, undermining Edinburgh's ambitious programme to attract more people to cycling before it even gets off the ground.

Investing in cycling infrastructure can make an enormous difference. Scotland suffers 2000 deaths per year due to pollution, and 2500 deaths per year due to inactivity. The original proposal for the cycleway estimates benefits of £14.5M over ten years (largely from improved health of those attracted to cycling) vs a cost of £5.7M, a staggering 3.3x return on investment. Katie Cycles to School is a brilliant video from Pedal on Parliament that drives home how investment in cycling will improve lives for cyclists and non-cyclists alike.

Want more detail? Much has been written on the issues.
  Roseburn Cycle Route: Evidence-based local community support.
  Conviction Needed.

The Transport Committee will need determination to carry the plan through to a successful conclusion. This is make or break: will Edinburgh be a city for cars or a city for people? Please write to your councillors and the transport and environment committee to let them know your views.


Previously
Roseburn to Leith Walk Cycleway: A vs B
Roseburn to Leith Walk Cycleway: the websiteRoseburn to Leith Walk

Subsequently:
Ride the Route in support of Option A
Option A: Think about the children
 
Categories: Offsite Blogs

Christopher Done: hindent 5: One style to rule them all

Planet Haskell - Sun, 08/28/2016 - 6:00pm
Reminder of the past

In 2014, in my last post about hindent, I wrote these points:

  1. Automatic formatting is important:
  1. Other people also care about this
  2. The Haskell community is not immune to code formatting debates

I proposed my hindent tool, which:

  1. Would format your code.
  2. Supported multiple styles.
  3. Supported further extension/addition of more styles trivially.
Things learned

I made some statements in that post that I’m going to re-evaluate in this post:

  1. Let’s have a code style discussion. I propose to solve it with tooling.
  2. It’s not practical to force everyone into one single style.
Code formatting is solved with tooling

I’ve used hindent for two years, it solves the problem. There are a couple exceptions1. On the whole, though, it’s a completely different working experience:

  • Code always looks the same.
  • I don’t make any style decisions. I just think about the tree I need for my program.
  • I don’t do any manual line-breaking.
  • I’ve come to exploit it by writing lazy code like do x<-getLine;when(x>5)(print 5) and then hitting a keybinding to reformat it.
Switching style is realistic

I’ve been writing Haskell in my own style for years. For me, my style is better for structured editing, more consistent, and visually easier to read, than most code I’ve seen. It’s like Lisp. Using hindent, with my ChrisDone style, I had it automatically formatted for me. I used 2-space indents.

The most popular style in the community2 is JohanTibell: The alignment, line-breaking, and spacing (4 spaces instead of 2) differs significantly to my own style.

At FP Complete I’ve done a lot of projects, private FP Complete projects, client projects, and public FP Complete projects (like Stack). For the first year or so I generally stuck to my guns when working on code only I was going to touch and used my superior style.

But once the JohanTibell style in hindent was quite stable, I found that I didn’t mind using it while collaborating with people who prefer that style. The tooling made it so automatic, that I didn’t have to understand the style or make any style decisions, I just wrote code and got on with it. It doesn’t work great with structured-haskell-mode, but that’s ok. Eventually I got used to it, and eventually switched to using it for my own personal projects.

I completely did a U-turn. So I’m hoping that much of the community can do so too and put aside their stylistic preferences and embrace a standard.

Going forward

hindent-5.* now supports one style, based on the Johan Tibell style guide. My own style guide is now deprecated in favor of that. The style flag --style foo is now silently ignored.

There is a demonstration web site in which you can try examples, and also get a link for the example to show other people the output (for debugging).

HIndent now has a “literate” test suite here: TESTS.md. You can read through it as a document, a bit like Johan’s style guide. But running the test suite parses this file and checks that each code fence is printed as written.

There’s also a BENCHMARKS.md, since I rewrote comment handling, switched to a bytestring-builder, improved the quadratic line-breaking algorithm to short-circuit, among other improvements, hindent now formats things in 1.5ms instead of 1s.

For those who still want to stick with their old hindent, Andrew Gibiansky is keeping a fork of hindent 4 for his personal use, and has said he’ll accept PR’s for that.

HIndent is not perfect, there’s always room for improvement (issue tracker welcomes issues), but over time that problem space gets smaller and smaller. There is support for Emacs, Vim and Atom. I would appreciate support for SublimeText too.

Give it a try!

  1. Such as CPP #if directives–they are tricky to handle. Comments are also tricky, but I’ve re-implemented comment handling from scratch and it works pretty well now. See the pretty extensive tests.

  2. From a survey of the top downloaded 1000 packages on Hackage, 660 are 4-spaced and 343 are 2-spaced. All else being equal, 4 spaces wins.

Categories: Offsite Blogs

Michael Snoyman: Follow up: haskell.org and the Evil Cabal

Planet Haskell - Sun, 08/28/2016 - 6:00pm

Yesterday I put out a blog post describing a very problematic situation with the haskell.org committee. As often happens with this kind of thing, a very lively discussion occurred on Reddit. There are many repeating themes over there, so instead of trying to address the points in that discussion, I'm going to give some responses in this post.

  • Firstly: thank you to those of you who subscribed to the haskell-community list and made your voices heard. That was the best response to the blog post I could have hoped for, and it happened. At this point, the Twitter poll and mailing list discussion both point to a desire to have Stack as the primary option on the downloads page (the latter is a tied vote of 6 to 6, indicating the change proposed should not happen). As far as I'm concerned, the committee has two options:

    • Listen to the voices of the community and make Stack the primary option on haskell.org.

    • Ignore the community voices and put the Haskell Platform at the top of the page, thus confirming my claims of an oligarchy.

  • Clarification: I do not believe anyone involved in this is an evil person. I thought my wording was unambiguous, but apparently not. The collusion among the projects is what gets the term "Evil Cabal." That said, I do believe that there were bad actions taken by individuals involved, and I've called some of those out. There's a much longer backstory here of the nepotism I refer to, starting at least at ICFP 2014 and GPS Haskell, but that's a story I'm not getting into right now.

  • A few people who should know better claimed that there's no reason for my complaint given that the Haskell Platform now ships with Stack. This is incorrect for multiple reasons. Firstly, one of my complaints in the blog post is that we've never discussed technical merits, so such a claim should be seen as absurd immediately. There's a great Reddit comment explaining that this inclusion is just misdirection. In any event, here are just 140 characters worth of reasons the Haskell Platform is inferior to Stack for a new user

    • There is no clear "getting started" guide for new users. Giving someone a download is only half the battle. If they don't know where to go next, the download it useless. (Compare with haskell-lang's getting started.)

    • Choice confusion: saying "HP vs Stack" is actually misleading. The real question is "HP+cabal-install vs HP+Stack vs Stack". A new user is not in a strong enough position to make this decision.

    • Stack will select the appropriate version of GHC to be used based on the project the user is working on. Bundling GHC with Stack insists on a specific GHC version. (I'm not arguing that there's no benefit to including GHC in the installer, but there are definitely downsides too.)

    • The HP release process has historically been very slow, whereas the Stack release process is a well oiled machine. I have major concerns about users being stuck with out-of-date Stack executables by using the HP and running into already fixed bugs. This isn't hypothetical: GHC for Mac OS X shipped an old Stack version for a while resulting in many bug reports. (This is an example of haskell.org download page decisions causing extra work for the Stack team.)

    • Bonus point (not on Twitter): Stack on its own is very well tested. We have little experience in the wild of HP+Stack. Just assuming it will work is scary, and goes against the history of buggy Haskell Platform releases.

  • A lot of the discussion seemed to assume I was saying to get rid of cabal-install entirely. In fact, my blog post said the exact opposite: let it continue if people want to work on it. I'm talking exclusively about the story we tell to new users. Again, technical discussions should have occurred long ago about what's the best course of action. I'm claiming that Stack is by far the best option for the vast majority of new users. The committee has never to my knowledge argued publicly against that.

  • There was a lot of "tone policing," saying things like I need to have more patience, work with not against the committee, follow the principle of charity, etc. If this is the first time I raised these issues, you'd be right. Unfortunately, there is a long history here of many years of wasted time and effort. The reason I always link back to pull request #130 is because it represents the tipping point from "work with the committee without making a fuss" to "I need to make all of these decisions as public as possible so bad decisions don't slip in."

    Let me ask you all: if I had just responded to the mailing list thread asking for a different course of action to be taken, would most of you know that this drama was happening? This needed to be public, so that no more massive changes could slip under everyone's radar.

    Also: it's ironic to see people accusing me of violating the principle of charity by reading my words in the most negative way they possibly can. That's true irony, not just misrepresenting someone's position.

  • For a long time, people have attacked FP Complete every chance they could, presumably because attacking a company is easier than attacking an individual. There is no "FP Complete" conspiracy going on here. I decided to write this blog post on my own, not part of any FP Complete strategy. I discussed it with others, most of whom do not work for FP Complete. In fact, most of the discussion happened publicly, on Twitter, for you all to see.

    If you want to attack someone, attack me. Be intellectually honest. And while you're at it: try to actually attack the arguments made instead of resorting to silly ad hominems about power grabs. Such tin-foil hattery is unbecoming.

  • There's a legitimate discussion about how we get feedback from multiple forms of communication (mailing lists, Twitter, Reddit). While that's a great question to ask and a conversation to have, it really misses the point here completely: we're looking for a very simple vote on three options. We can trivially put up a Google Form or similar and link to it from all media. We did this just fine with the FTP debate. It feels almost disingenuous to claim that we don't know how to deal with this problem when we've already dealt with it in the past.

Categories: Offsite Blogs

Dimitri Sabadie: luminance designs

Planet Haskell - Sun, 08/28/2016 - 5:46pm

luminance-0.7.0 was released a few days ago and I decided it was time to explain exactly what luminance is and what were the design choices I made. After a very interesting talk with nical about other rust graphics frameworks (e.g. gfx, glium, vulkano, etc.), I thought it was time to give people some more information about luminance and how to compare it to other frameworks.

Origin

luminance started as a Haskell package, extracted from a “3D engine” I had been working on for a while called quaazar. I came to the realization that I wasn’t using the Haskell garbage collector at all and that I could benefit from using a language without GC. Rust is a very famous language and well appreciated in the Haskell community, so I decided to jump in and learn Rust. I migrated luminance in a month or two. The mapping is described in this blog entry.

What is luminance for?

I’ve been writing 3D applications for a while and I always was frustrated by how OpenGL is badly designed. Let’s sum up the lack of design of OpenGL:

  • weakly typed: OpenGL has types, but… it actually does not. GLint, GLuint or GLbitfield are all defined as aliases to primary and integral types (i.e. something like typedef float GLfloat). Try it with grep -Rn "typedef [a-zA-Z]* GLfloat" /usr/include/GL. This leads to the fact that framebuffers, textures, shader stages, shader program or even uniforms, etc. have the same type (GLuint, i.e. unsigned int). Thus, a function like glCompileShader expects a GLuint as argument, though you can pass a framebuffer, because it’s also represented as a GLuint – very bad for us. It’s better to consider that those are just untyped – :( – handles.
  • runtime overhead: Because of the point above, functions cannot assume you’re passing a value of a the expected type – e.g. the example just above with glCompileShader and a framebuffer. That means OpenGL implementations have to check against all the values you’re passing as arguments to be sure they match the type. That’s basically several tests for each call of an OpenGL function. If the type doesn’t match, you’re screwed and see the next point.
  • error handling: This is catastrophic. Because of the runtime overhead, almost all functions might set the error flag. You have to check the error flag with the glGetError function, adding a side-effect, preventing parallelism, etc.
  • global state: OpenGL works on the concept of global mutation. You have a state, wrapped in a context, and each time you want to do something with the GPU, you have to change something in the context. Such a context is important; however, some mutations shouldn’t be required. For instance, when you want to change the value of an object or use a texture, OpenGL requires you to bind the object. If you forget to bind for the next object, the mutation will occurs on the first object. Side effects, side effects…

The goal of luminance is to fix most of those issues by providing a safe, stateless and elegant graphics framework. It should be as low-level as possible, but shouldn’t sacrifice runtime performances – CPU charge as well as memory bandwidth. That is why if you know how to program with OpenGL, you won’t feel lost when getting your feet wet with luminance.

Because of the many OpenGL versions and other technologies (among them, vulkan), luminance has an extra aim: abstract over the trending graphics API.

Types in luminance

In luminance, all graphics resources – and even concepts – have their own respective type. For instance, instead of GLuint for both shader programs and textures, luminance has Program and Texture. That ensures you don’t pass values with the wrong types.

Because of static warranties provided by compile-time, with such a scheme of strong-typing, the runtime shouldn’t have to check for type safety. Unfortunately, because luminance wraps over OpenGL in the luminance-gl backend, we can only add static warranties; we cannot remove the runtime overhead.

Error handling

luminance follows the Rust conventions and uses the famous Option and Result types to specify errors. You will never have to check against a global error flag, because this is just all wrong. Keep in mind, you have the try! macro in your Rust prelude; use it as often as possible!

Even though Rust needs to provide an exception handler – i.e. panics – there’s no such thing as exceptions in Rust. The try! macro is just syntactic sugar to:

match result {
Ok(x) => x,
Err(e) => return e
}Stateless

luminance is stateless. That means you don’t have to bind an object to be able to use it. luminance takes care of that for you in a very simple way. To achieve this and keep performances running, it’s required to add a bit of high-level to the OpenGL API by leveraging how binds should happen.

Whatever the task you’re trying to reach, whatever computation or problem, it’s always better to gather / group the computation by batches. A good example of that is how magnetic hard drive disks work or your RAM. If you spread your data across the disk region (fragmented data) or across several non-contiguous addresses in your RAM, it will end up by unnecessary moves. The hard drive’s head will have to go all over the disk to gather the information, and it’s very likely you’ll destroy the RAM performance (and your CPU caches) if you don’t put the data in a contiguous area.

If you don’t group your OpenGL resources – for instances, you render 400 objects with shader A, 10 objects with shader B, then 20 objects with shader A, 32 objects with shader C, 349 objects with shader A and finally 439 objects with shader B, you’ll add more OpenGL calls to the equation – hence more global state mutations, and those are costly.

Instead of this:

  1. 400 objects with shader A
  2. 10 objects with shader B
  3. 20 objects with shader A
  4. 32 objects with shader C
  5. 348 objects with shader A
  6. 439 objects with shader B

luminance forces you to group your resources like this:

  1. 400 + 20 + 348 objects with shader A
  2. 10 + 439 objects with shader B
  3. 32 objects with shader C

This is done via types called Pipeline, ShadingCommand and RenderCommand.

Pipelines

A Pipeline gathers shading commands under a Framebuffer. That means that all ShadingCommand embedded in the Pipeline will output to the embedded Framebuffer. Simple, yet powerful, because we can bind the framebuffer when executing the pipeline and don’t have to worry about framebuffer until the next execution of another Pipeline.

ShadingCommand

A ShadingCommand gathers render commands under a shader Program along with an update function. The update function is used to customize the Program by providing uniforms – i.e. Uniform. If you want to change a Programs Uniform once a frame – and only if the Program is only called once in the frame – it’s the right place to do it.

All RenderCommand embedded in the ShadingCommand will be rendered using the embedded shader Program. Like with the Pipeline, we don’t have to worry about binding: we just have to use the embedded shader program when executing the ShadingCommand, and we’ll bind another program the next time a ShadingCommand is ran!

RenderCommand

A RenderCommand gathers all the information required to render a Tessellation, that is:

  • the blending equation, source and destination blending factors
  • whether the depth test should be performed
  • an update function to update the Program being in use – so that each object can have different properties used in the shader program
  • a reference to the Tessellation to render
  • the number of instances of the Tessellation to render
  • the size of the rasterized points (if the Tessellation contains any)
What about shaders?

Shaders are written in… the backend’s expected format. For OpenGL, you’ll have to write GLSL. The backends automatically inserts the version pragma (#version 330 core for OpenGL 3.3 for instance). In the first place, I wanted to migrate cheddar, my Haskell shader EDSL. But… the sad part of the story is that Rust is – yet – unable to handle that kind of stuff correctly. I started to implement an EDSL for luminance with macros. Even though it was usable, the error handling is seriously terrible – macros shouldn’t be used for such an important purpose. Then some rustaceans pointed out I could implement a (rustc) compiler plugin. That enables the use of new constructs directly into Rust by extending its syntax. This is great.

However, with the hindsight, I will not do that. For a very simple reason. luminance is, currently, simple, stateless and most of all: it works! I released a PC demo in Köln, Germany using luminance and a demoscene graphics framework I’m working on:

pouët.net link

youtube capture

ion demoscene framework

While developping Céleri Rémoulade, I decided to bake the shaders directly into Rust – to get used to what I had wanted to build, i.e., a shader EDSL. So there’re a bunch of constant &'static str everywhere. Each time I wanted to make a fix to a shader, I had to leave the application, make the change, recompile, rerun… I’m not sure it’s a good thing. Interactive programming is a very good thing we can enjoy – yes, even in strongly typed languages ;).

I saw that gfx doesn’t have its own shader EDSL either and requires you to provide several shader implementations (one per backend). I don’t know; I think it’s not that bad if you only target a single backend (i.e. OpenGL 3.3 or Vulkan). Transpiling shaders is a thing, I’ve been told…

sneaking out…

Feel free to dig in the code of Céleri Rémoulade here. It’s demoscene code, so it had been rushed on before the release – read: it’s not as clean as I wanted it to be.

I’ll provide you with more information in the next weeks, but I prefer spending my spare time writing code than explaining what I’m gonna do – and missing the time to actually do it. ;)

Keep the vibe!

Categories: Offsite Blogs

Philip Wadler: Ride the Route in support of Option A

Planet Haskell - Sun, 08/28/2016 - 8:17am

I've written before about the Edinburgh West-East Cycle Route (previously called Roseburn to Leith Walk), and the importance of choosing Option A over Option B.

It's fantastic that Edinburgh has decided to invest 10% of its transport budget into active travel. If we invest regularly and wisely in cycling infrastructure, within twenty years Edinburgh could be a much more pleasant place to live and work, on a par with Copenhagen or Rotterdam. But that requires investing the effectively. The choice of Option A vs B is a crucial step along the way. Option B offers a far less direct route and will do far less to attract new people to cycling, undermining the investment and making it harder to attract additional funding from Sustrans. Unless we start well, it will be harder to continue well.
SNP Councillors are putting it about that since Sustrans awarded its competition to Glasgow rather than Edinburgh that the route cannot be funded. But that is nonsense. Edinburgh can build the route on its own, it would just take longer. And in any event, year on year funding from Sustrans is still available. But funding is only likely to be awarded for an ambitious project that will attract more folk to cycling, and that means Option A.
(Imagine if auto routes were awarded by competition. You can have the M80 to Glasgow or the M90 to Edinburgh, but not both ... Sort of like the idea of holding a bake sale to fund a war ...)
Supporters have organised a Ride the Route event 8am Monday 29 August, leaving from Charlotte Square, which will take councillors and press along the route to promote Option A.  (And here's a second announcement from Pedal on Parliament.) I hope to see you there!
Categories: Offsite Blogs

Michael Snoyman: haskell.org and the Evil Cabal

Planet Haskell - Sat, 08/27/2016 - 6:00pm

There's no point being coy or saying anything but what I actually believe, and saying it bluntly. So here it is:

The haskell.org committee has consistently engaged in tactics which silence the voices of all non-members, and stacks the committee to prevent dissenting opinions from joining.

I've said various parts of this previously. You may have heard me say things like the haskell.org oligarchy, refer to the "evil cabal of Haskell" (referring to the nepotism which exists amongst Hackage, cabal-install, haskell.org, and the Haskell Platform), or engage in lengthy debates with committee members about their actions.

This is a pretty long post, if you want to see my request, please jump to the end.

The backstory

To summarize a quick backstory: many of us in the community have been dissatisfied with the four members of the "evil cabal" for years, and have made efforts to improve them, only to be met with opposition. One by one, some of us have been replacing these components with alternatives. Hackage's downtime led to an FP Complete mirror and more reliable doc hosting on stackage.org. cabal-install's weaknesses led to the creation of the Stack build tool. Haskell Platform's poor curation process and broken installer led to Stackage Nightly and LTS Haskell, as well some of the Stack featureset. And most recently, the haskell.org committee's poor decisions (as I'll demonstrate shortly) for website content led to resurrecting haskell-lang.org, a website devoted to actually making Haskell a more approachable language.

As you can see, at this point all four members of the evil cabal have been replaced with better options, and community discussions and user statistics indicate that most users are switching over. (For an example of statistics, have a look at the package download count on Hackage, indicating that the vast majority of users are no longer downloading packages via cabal-install+Hackage.) I frankly have no problem at all with the continued existence and usage of these four projects; if people want to spend their time on them and use what I consider to be inferior tools, let them. The only remaining pain point is that new, unsuspecting users will arrive at haskell.org download page instead of the much more intuitive haskell-lang.org get started page.

EDIT Ignore that bit about the download statistics, it's apparently due to the CDN usage on Hackage. Instead, one need only look at how often a question about Haskell Platform is answered with "don't do that, use Stack instead." For a great example, see the discussion of the Rust Platform.

The newest attempt

Alright, with that out of the way, why am I writing this blog post now? It's due to this post on the Haskell-community mailing list, proposing promoting the Haskell Platform above all other options (yet again). Never heard of that mailing list? That's not particularly surprising. That mailing list was created in response to a series of complaints by me, claiming that the haskell.org committee acted in a secretive way and ignored all community input. The response to this was, instead of listening to the many community discussions already occuring on Twitter and Reddit, to create a brand new mailing list, have an echo chamber of people sympathetic to Evil Cabal thought, and insist that "real" community discussions go on there.

We're seeing this process work exactly as the committee wants. Let me demonstrate clearly how. At the time of writing this blog post, three people have voted in favor of promoting the HP on haskell-community, including two haskell.org committee members (Adam Foltzer and John Wiegley) and the person who originally proposed it, Jason Dagit. There were two objections: Chris Allen and myself. So with a sample size of 5, we see that 60% of the community wants the HP.

The lie

A few hours after this mailing list post, I put out a poll on Twitter. At time of writing (4 hours or so into the poll), we have 122 votes, with 85% in favor of Stack, and 15% in favor of some flavor of the Haskell Platform (or, as we'll now be calling, the Perfect Haskell Platform). Before anyone gets too excited: yes, a poll of my Twitter followers is obviously a biased sample, but no more biased than the haskell-community list. My real point is this:

The haskell.org committee is posing questions of significant importance in echo chambers where they'll get the response they want from a small group of people, instead of engaging the community correctly on platforms that make participation easy.

This isn't the first time this has happened. When we last discussed haskell.org download page content, a similar phenonmonon occurred. Magically, the haskell-community discussion had a bias in favor of the Haskell Platform. In response, I created a Google Form, and Stack was the clear victor:

Yet despite this clear feedback, the committee went ahead with putting minimal installers at the top, not Stack (they weren't quite brazen enough to put the Perfect Haskell Platform at the top or even above Stack, for which I am grateful).

Proper behavior

As I see it, the haskell.org committee has two correct options to move forward with making the download page decision:

  • Accept the votes from my Twitter poll in addition to the haskell-community votes
  • Decide that my poll is invalid for some reason, and do a proper poll of the community, with proper advertisement on Reddit, Twitter, the more popular mailing lists, etc

If past behavior is any indication though, I predict a third outcome: stating that the only valid form of feedback is on the haskell-community mailing list, ignore the clear community groundswell against their decisions, and continue to make unilateral, oligarchic decisions. Namely: promote the Haskell Platform, thereby misleading all unfortunate new Haskellers who end up at haskell.org instead the much better haskell-lang.org.

Further evidence

Everyone's always asking me for more of the details on what's gone on here, especially given how some people vilify my actions. I've never felt comfortable putting that kind of content on blogs shared with other authors when some of those others don't want me to call out the negative actions. However, thankfully I now have my own blog to state this from. This won't include every punch thrown in this long and sordid saga, but hopefully will give a much better idea of what's going on here.

  • Not only are conversations held in private by the committee, but:

    • Their private nature is used to shut down commentary on committee actions
    • There is open deception about what was actually discussed in private

    Evidence: see this troubling Reddit thread. I made the (very true) claim that Gershom made a unilateral decision about the downloads page. You can see the evidence of this where he made that decision. Adam Foltzer tried to call my claim false, and ultimately Gershom himself confirmed I was correct. Adam then claimed offense at this whole discussion and backed out.

  • When I proposed making Stack the preferred download option (at a time when Stack did not appear at all on haskell.org), Gershom summarilly closed the pull request. I have referenced this pull request many times. I don't believe any well intentioned person can read that long discussion and believe that the haskell.org committee has a healthy process for maintaining a community website.

  • At no point in any of these discussions has the committee opened up discussion to either the technical advantages of the HP vs Stack, or the relative popularity. Instead, we get discussions of committee process, internal votes, an inability to make changes at certain periods of time based on previously made and undocumented decisions.

  • We often hear statements from committee members about the strong support for their actions, or lack of controversy on an issue. These claims are many times patently false to any objective third party. For example, Gershom claimed that the pull request #122 that he unilaterally decided to merge was "thought to be entirely mundane and uncontroversial." Everyone is welcome to read the Reddit discussion and decide if Gershom is giving a fair summary or not.

  • Chris Done - a coworker of mine - spent his own time on creating the first haskell-lang.org, due to his unhappiness with the homepage at that time. His new site was met with much enthusiasm, and he was pressured by many to get it onto haskell.org itself. What ensued was almost a year of pain working out the details, having content changed to match the evil cabal narrative, and eventually a rollout. At the end of this, Chris was - without any given reason - not admitted to the haskell.org committee, denying him access to share an opinion on what should be on the site he designed and created.

My request

Thank you for either getting through all of that, or skipping to this final section. Here's my request: so many people have told me that they feel disenfranchised by these false-flag "community" processes, and just give up on speaking up. This allows the negative behavior we've seen dominate the evil cabal in Haskell for so long. If you've already moved on to Stack and Stackage yourself, you're mostly free of this cabal. I'm asking you to think of the next generation of Haskell users, and speak up.

Most powerful course of action: subscribe to the haskell-community mailing list and speak out about how the committee has handled the downloads page. Don't just echo my message here: say what you believe. If you think they've done a good job, then say so. If you think (like I do) that they've done a bad job, and are misleading users with their decisions, say that.

Next best: comment about this on Reddit or Twitter. Get your voice out there and be heard, even if it isn't in the haskell.org committee echo chamber.

In addition to that: expect me to put out more polls on Twitter and possibly elsewhere. Please vote! We've let a select few make damaging decisions for too long, make your voice heard. I'm confident that we will have a more user-friendly Haskell experience if we actually start listening to users.

And finally: as long as it is being mismanaged, steer people away from haskell.org. This is why we created haskell-lang.org. Link to it, tell your friends about it, warn people away from haskell.org, and maybe even help improve its content.

Archive link of the Reddit and Github threads quoted above:

  • http://archive.is/7zFkb
  • http://archive.is/NTzUD
  • http://archive.is/roexm
  • http://archive.is/uwdzr
  • http://archive.is/uduu5
Categories: Offsite Blogs

Edward Z. Yang: Optimizing incremental compilation

Planet Haskell - Sat, 08/27/2016 - 4:03am

When you run make to build software, you expect a build on software that has been previously built to take less time than software we are building from scratch. The reason for this is incremental compilation: by caching the intermediate results of ahead-of-time compilation, the only parts of a program that must be recompiled are those that depend on the changed portions of the dependency graph.

The term incremental compilation doesn't say much about how the dependency graph is set up, which can lead to some confusion about the performance characteristics of "incremental compilers." For example, the Wikipedia article on incremental compilation claims that incremental compilers cannot easily optimize the code that it compiles. This is wrong: it depends entirely on how your dependency graph is set up.

Take, for example, gcc for C:

The object file a.o depends on a.c, as well as any header files it (transitively) includes (a.h, in this case.) Since a.o and main.o do not depend on each other, if a.c is rebuilt, main.o does not need to rebuilt. In this sense, C is actually amazingly incremental (said no C programmer ever.) The reason C has a bad reputation for incremental compilation is that, naively, the preprocessing of headers is not done incrementally at all (precompiled headers are an attempt to address this problem).

The dependency graph implies something else as well: unless the body of a function is placed in a.h, there is no way for the compiler that produces main.o to inline the body in: it knows nothing about the C file. a.c may not even exist at the point main.o is being built (parallelism!) The only time such optimization could happen is at link-time (this is why link-time optimization is a thing.)

A nice contrast is ghc for Haskell:

Here, Main.{hi,o} depend not only on Main.hs but A.hi, the module it imports. GHC is still incremental: if you modify an hs file, only things that import that source file that need to be recompiled. Things are even better than this dependency diagram implies: Main.{hi,o} may only depend on specific pieces of A.hi; if those pieces are unchanged GHC will exit early and report compilation is NOT necessary.

Despite being incremental, GHC supports inlining, since unfoldings of functions can be stored in hi files, which can subsequently be used by modules which import it. But now there is a trade-off: if you inline a function, you now depend on the unfolding in the hi file, making it more likely that compilation is necessary when A.hi changes.

As one final example, incremental compilers in IDEs, like the Java compiler in Eclipse, are not doing anything fundamentally different than the operation of GHC. The primary differences are (1) the intermediate products are held in memory, which can result in huge savings since parsing and loading interfaces into memory is a huge timewaster, and (2) they try to make the dependency diagram as fine-grained as possible.

This is all fairly well known, so I want to shift gears and think about a less well-understood problem: how does one do incremental compilation for parametrized build products? When I say parametrized, I mean a blend of the C and Haskell paradigms:

  • Separate compilation. It should be possible to depend on an interface without depending on an implementation (like when a C file depends on a header file.)
  • Cost-free abstraction. When the implementation is provided, we should (re)compile our module so that we can inline definitions from the implementation (like when a Haskell module imports another module.)

This problem is of interest for Backpack, which introduces libraries parametrized over signatures to Haskell. For Backpack, we came up with the following design: generate distinct build products for (1) uninstantiated code, for which we know an interface but not its implementation, and (2) instantiated code, for which we know all of their implementations:

In the blue box, we generate A.hi and Main.hi which contain purely the results of typechecking against an interface. Only in the pink box do we combine the implementation of A (in the red box) with the user of A (Main). This is just a graph; thus, incremental compilation works just as it works before.

We quickly ran into an intriguing problem when we sought to support multiple interfaces, which could be instantiated separately: if a client instantiates one interface but not the other, what should we do? Are we obligated to generate build products for these partially instantiated modules? This is not very useful, since we can't generate code yet (since we don't know all of the implementations.)

An important observation is that these interfaces are really cheap to generate (since you're not doing any compilation). Thus, our idea was to do the instantiation on-the-fly, without actually generating build products. The partially instantiated interfaces can be cached in memory, but they're cheap to generate, and we win if we don't need them (in which case we don't instantiate them.)

This is a bit of a clever scheme, and cleverness always has a dark side. A major source of complexity with on-the-fly instantiation is that there are now two representations of what is morally the same build product: the on-the-fly products and the actually compiled ones:

The subtyping relation between these two products states that we can always use a compiled interface in place of an on-the-fly instantiated one, but not vice versa: the on-the-fly interface is missing unfoldings and other important information that compiled code may need.

If we are type-checking only (we have uninstantiated interfaces), we might prefer on-the-fly interfaces, because they require less work to create:

In contrast, if we are compiling a package, we must use the compiled interface, to ensure we see the necessary unfoldings for inlining:

A particularly complicated case is if we are type-checking an uninstantiated set of modules, which themselves depend on some compiled interfaces. If we are using an interface p+a/M.hi, we should be consistent about it, and since r must use the compiled interfaces, so must q:

The alternative is to ensure that we always build products available that were typechecked against the on-the-fly interfaces, as below:

But this has the distasteful effect of requiring everything to be built twice (first typechecked against the on-the-fly interfaces, and then built for real).

The dependency graphs of build products for an ahead-of-time compiler is traditionally part of the public API of a compiler. As I've written previously, to achieve better incrementality, better parallelism, and more features (like parametrized modules), dependency graphs become more and more complicated. When compiler writers don't want to commit to an interface and build tool authors aren't interested learning about a complicated compilation model, the only systems that work well are the integrated ones.

Is Backpack's system for on-the-fly interface instantiation too clever for its own good? I believe it is well-designed for the problem it tries to solve, but if you still have a complicated design, perhaps you are solving the wrong problem. I would love to hear your thoughts.

Categories: Offsite Blogs

Functional Jobs: Full-Stack Developer (Haskell/PureScript) at CollegeVine (Full-time)

Planet Haskell - Fri, 08/26/2016 - 5:22pm
Overview

CollegeVine is looking for a product-focused full-stack developer to help engineer the future of mentorship and higher education attainment.

There aren't many industries left that haven't been significantly disrupted by technology in some way, but you're reading about one right here! You will find many opportunities to apply high-leverage computer science (think machine learning, probabilistic reasoning, etc.) as well as plenty of opportunities for the more human side of the problem. As it stands, the current admissions process is a huge source of stress and confusion for students and parents alike. If we execute correctly, your work will impact the entire next generation of college graduates-to-be.

You will join a fast-moving company whose culture centers around authenticity, excellence, and balance. You'll find that everyone likes to keep things simple and transparent. We prefer to be goal-oriented and hands-off as long as you are a self-starter.

Our modern perspective on developer potential means we celebrate and optimize for real output. And that's probably the reason why we're a polyglot functional programming shop, with emphasis on Haskell and functional paradigms. Our infrastructure and non-mission-critical tooling tends to be in whatever works best for the task at hand: sometimes that's Haskell with advanced GHC extensions a-blazin', other times it's minimalist Ruby or bash—basically, it's a team decision based on whatever sits at the intersection of appropriateness, developer joy, quality, and velocity.

As an early-stage company headquartered in Cambridge, MA, we have a strong preference for key members of our team to be located in the Boston metro area; however, given that our company has its roots in remote work (and that it's 2016), we are open to remote arrangements after one year of continuous employment and/or executive approval.

Requirements

You know you are right for this position if:

  • You have at least five years of professional software engineering experience, and at least two years of preference for a high-level programming language that's used in industry, like Haskell, Clojure, OCaml, Erlang, F#, or similar.
  • You have some front-end experience with JS or a functional language that compiles to JS, like PureScript, Elm, Clojurescript, or similar. We use PureScript, React, and ES6 on the front-end. It's pretty awesome.
  • You are a self-starter and internally motivated, with a strong desire to be part of a successful team that shares your high standards.
  • You have great written communication skills and are comfortable with making big decisions over digital presence (e.g. video chat).
  • You have polyglot experience along several axes (dynamic/static, imperative/functional, lazy/strict, weird/not-weird).
  • You are comfortable with modern infrastructure essentials like AWS, Heroku, Docker, CI, etc. You have basic but passable sysadmin skills.
  • You are fluent with git.
  • You instrument before you optimize. You test before you ship. You listen before you conclude. You measure before you cut. Twice.
Benefits

We offer a competitive salary and a full suite of benefits, some of them unconventional, but awesome for the right person:

  • Medical, dental, and vision insurance come standard.
  • Flexible hours with a 4-hour core - plan the rest of your workday as you wish, just give us the majority of your most productive hours. Productivity ideas: avoid traffic, never wait in line at the grocery store, wake up without an alarm clock.
  • Goal-based environment (as opposed to grind-based or decree-based environment; work smarter, not harder; intelligently, not mindlessly). We collaborate on setting goals, but you set your own process for accomplishing those goals. You will be entrusted with a lot of responsibility and you might even experience fulfillment and self-actualization as a result.
  • Daily physical activity/mindfulness break + stipend: invest a non-core hour to make yourself more awesome by using it for yoga, tap-dance lessons, a new bike, massage, a surfboard - use your imagination! Just don’t sit at a computer all day! Come back to work more relaxed and productive and share your joy with the rest of the team. Note: You must present and share proof of your newly enriched life with the team in order to receive the stipend.

Remember: We’re a startup. You’re an early employee. We face challenges. We have to ship. Your ideas matter. You will make a difference.

Get information on how to apply for this position.

Categories: Offsite Blogs

Brandon Simmons: Announcing: unagi-bloomfilter

Planet Haskell - Thu, 08/25/2016 - 8:47am

I just released a new Haskell library called unagi-bloomfilter that is up now on hackage. You can install it with:

$ cabal install unagi-bloomfilter

The library uses the bloom-1 variant from “Fast Bloom Filters and Their Generalization” by Yan Qiao, et al. I’ll try to write more about it when I have the time. Also I just gave a talk on things I learned working on the project last night at the New York Haskell User Group:

http://www.meetup.com/NY-Haskell/events/233372271/

It was quite rough, but I was happy to hear from folks that found some interesting things to take away from it.

Thanks to Gershom for inviting me to speak, for my company Signal Vine for sponsoring my trip out, and to Yan Qiao for generously answering my silly questions and helping me understand the paper.

P.S. We’re hiring haskell developers

Signal Vine is an awesome group of people, with interesting technology and problems to solve, and we’re looking to grow the small development team. If you have some experience with haskell (you don’t have to be a guru) and are interested, please reach out to Jason or me at:

brandon@signalvine.com jason@signalvine.com
Categories: Offsite Blogs

Michael Snoyman: Restarting this blog

Planet Haskell - Tue, 08/23/2016 - 6:00pm

Just a minor note: I'm planning on starting up this blog again, with some personal thoughts - likely still mostly around programming and Haskell - that don't fit in the other blogs that I contribute to (Yesod Web Framework and FP Complete).

I don't have a clear list of topics I'm going to be covering, but I'll likely be sharing some thoughts on running engineering teams and startups effectively. If you have something you'd like me to cover, please Tweet it to me.

Categories: Offsite Blogs

Roman Cheplyaka: Extract the first n sequences from a FASTA file

Planet Haskell - Tue, 08/23/2016 - 2:00pm

A FASTA file consists of a series of biological sequences (DNA, RNA, or protein). It looks like this:

>gi|173695|gb|M59083.1|AETRR16S Acetomaculum ruminis 16S ribosomal RNA NNTAAACAAGAGAGTTCGATCCTGGCTCAGGATNAACGCTGGCGGCATGCCTAACACATGCAAGTCGAAC GGAGTGCTTGTAGAAGCTTTTTCGGAAGTGGAAATAAGTTACTTAGTGGCGGACGGGTGAGTAACGCGTG >gi|310975154|ref|NR_037018.1| Acidaminococcus fermentans strain VR4 16S ribosomal RNA gene, partial sequence GGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAGAACTTTCTTCGGAATGTTC TTAGTGGCGAACGGGTGAGTAACGCGTAGGCAACCTNCCCCTCTGTTGGGGACAACATTCCGAAAGGGAT

There probably exist dozens of python scripts to extract the first \(n\) sequences from a FASTA file. Here I will show an awk one-liner that performs this task, and explain how it works.

Here it is (assuming the number of sequences is stored in the environment variable NSEQS):

awk "/^>/ {n++} n>$NSEQS {exit} {print}"

This one-liner can read from standard input (e.g. as part of a pipe), or you can append one or more file names to the end of the command, e.g.

awk "/^>/ {n++} n>$NSEQS {exit} {print}" file.fasta

An awk script consists of one or more statements of the form pattern { actions }. The input is read line-by-line, and if the current line matches the pattern, the corresponding actions are executed.

Our script consists of 3 statements:

  1. /^>/ {n++} increments the counter each time a new sequence is started. /.../ denotes a regular expression pattern, and ^> is a regular expression that matches the > sign at the beginning of a line.

    An uninitialized variable in awk has the value 0, which is exactly what we want here. If we needed some other initial value (say, 1), we could have added a BEGIN pattern like this: BEGIN {n=1}.
  2. n>$NSEQS {exit} aborts processing once the counter reaches the desired number of sequences.
  3. {print} is an action without a pattern (and thus matching every line), which prints every line of the input until the script is aborted by exit.

A shorter and more cryptic way to write the same is

awk "/^>/ {n++} n>$NSEQS {exit} 1"

Here I replaced the action-without-pattern by a pattern-without-action. The pattern 1 (meaning “true”) matches every line, and when the action is omitted, it is assumed to be {print}.

Categories: Offsite Blogs

mightybyte: Measuring Software Fragility

Planet Haskell - Mon, 08/22/2016 - 9:41am
<style> .hl { background-color: orange; } </style>

While writing this comment on reddit I came up with an interesting question that I think might be a useful way of thinking about programming languages. What percentage of single non-whitespace characters in your source code could be changed to a different character such that the change would pass your CI build system but would result in a runtime bug? Let's call this the software fragility number because I think that metric gives a potentially useful measure of how bug prone your software is.

At the end of the day software is a mountain of bytes and you're trying to get them into a particular configuration. Whether you're writing a new app from scratch, fixing bugs, or adding new features, the number of bytes of source code you have (similar to LOC, SLOC, or maybe the compressed number of bytes) is rough indication of the complexity of your project. If we model programmer actions as random byte mutations over all of a project's source and we're trying to predict the project's defect rate this software fragility number is exactly the thing we need to know.

Now I'm sure many people will be quick to point out that this random mutation model is not accurate. Of course that's true. But I would argue that in this way it's similar to the efficient markets hypothesis in finance. Real world markets are obviously not efficient (Google didn't become $26 billion less valuable because the UK voted for brexit). But the efficient markets model is still really useful--and good luck finding a better one that everybody will agree on.

What this model lacks in real world fidelity, it makes up for in practicality. We can actually build an automated system to calculate a reasonable approximation of the fragility number. All that has to be done is take a project, randomly mutate a character, run the project's whole CI build, and see if the result fails the build. Repeat this for every non-whitespace character in the project and count how many characters pass the build. Since the character was generated at random, I think it's reasonable to assume that any mutation that passes the build is almost definitely a bug.

Performing this process for every character in a large project would obviously require a lot of CPU time. We could make this more tractable by picking characters at random to mutate. Repeat this until you have done it for a large enough number of characters and then see what percentage of them made it through the build. Alternatively, instead of choosing random characters you could choose whole modules at random to get more uniform coverage over different parts of the language's grammar. There are probably a number of different algorithms that could be tried for picking random subsets of characters to test. Similar to numerical approximation algorithms such as Newton's method, any of these algorithms could track the convergence of the estimate and stop when the value gets to a sufficient level of stability.

Now let's investigate actual fragility numbers for some simple bits of example code to see how this notion behaves. First let's look at some JavaScript examples.

It's worth noting that comment characters should not be allowed to be chosen for mutation since they obviously don't affect the correctness of the program. So the comments you see here have not been included in the calculations. Fragile characters are highlighted in orange.

// Fragility 12 / 48 = 0.25 function f(n) { if ( n < 2 ) return 1; else return n * f(n-1); } // Fragility 14 / 56 = 0.25 function g(n) { var p = 1; for (var i = 2; i <= n; i++ ) { p *= i; } return p; }

First I should say that I didn't write an actual program to calculate these. I just eyeballed it and thought about what things would fail. I easily could have made mistakes here. In some cases it may even be subjective, so I'm open to corrections or different views.

Since JavaScript is not statically typed, every character of every identifier is fragile--mutating them will not cause a build error because there isn't much of a build. JavaScript won't complain, you'll just start getting undefined values. If you've done a signifciant amount of JavaScript development, you've almost definitely encountered bugs from mistyped identifier names like this. I think it's mildly interesting that the recursive and iterative formulations if this function both have the same fragility. I expected them to be different. But maybe that's just luck.

Numerical constants as well as comparison and arithmetic operators will also cause runtime bugs. These, however, are more debatable because if you use the random procedure I outlined above, you'll probably get a build failure because the character would have probably changed to something syntactically incorrect. In my experience, it semes like when you mistype an alpha character, it's likely that the wrong character will also be an alpha character. The same seems to be true for the classes of numeric characters as well as symbols. The method I'm proposing is that the random mutation should preserve the character class. Alpha characters should remain alpha, numeric should remain numeric, and symbols should remain symbols. In fact, my original intuition goes even further than that by only replacing comparison operators with other comparison operators--you want to maximize the chance that new mutated character will cause a successful build so the metric will give you a worst-case estimate of fragility. There's certainly room for research into what patterns tend come up in the real world and other algorithms that might describe that better.

Now let's go to the other end of the programming language spectrum and see what the fragility number might look like for Haskell.

// Fragility 7 / 38 = 0.18 f :: Int -> Int f n | n < 2 = 1 | otherwise = n * f (n-1)

Haskell's much more substantial compile time checks mean that mutations to identifier names can't cause bugs in this example. The fragile characters here are clearly essential parts of the algorithm we're implementing. Maybe we could relate this idea to information theory and think of it as an idea of how much information is contained in the algorithm.

One interesting thing to note here is the effect of the length of identifier names on the fragility number. In JavaScript, long identifier names will increase the fragility because all identifier characters can be mutated and will cause a bug. But in Haskell, since identifier characters are not fragile, longer names will lower the fragility score. Choosing to use single character identifier names everywhere makes these Haskell fragility numbers the worst case and makes JavaScript fragility numbers the best case.

Another point is that since I've used single letter identifier names it is possible for a random identifier mutation in Haskell to not cause a build failure but still cause a bug. Take for instance a function that takes two Int parameters x and y. If y was mutated to x, the program would still compile, but it would cause a bug. My set of highlighted fragile characters above does not take this into account because it's trivially avoidable by using longer identifier names. Maybe this is an argument against one letter identifier names, something that Haskell gets criticism for.

Here's the snippet of Haskell code I was talking about in the above reddit comment that got me thinking about all this in the first place:

// Fragility 31 / 277 = 0.11 data MetadataInfo = MetadataInfo { title :: Text , description :: Text } pageMetadataWidget :: MonadWidget t m => Dynamic t MetadataInfo -> m () pageMetadataWidget i = do el "title" $ dynText $ title <$> i elDynAttr "meta" (mkDescAttrs . description <$> i) blank where mkDescAttrs desc = "name" =: "description" "content" =: desc

In this snippet, the fragility number is probably close to 31 characters--the number of characters in string literals. This is out of a total of 277 non-whitespace characters, so the software fragility number for this bit of code is 11%. This half the fragility of the JS code we saw above! And as I've pointed out, larger real world JS examples are likely to have even higher fragility. I'm not sure how much we can conclude about the actual ratios of these fragility numbers, but at the very least it matches my experience that JS programs are significantly more buggy than Haskell programs.

The TDD people are probably thinking that my JS examples aren't very realistic because none of them have tests, and that tests would catch most of the identifier name mutations, bringing the fragility down closer to Haskell territory. It is true that tests will probably catch some of these things. But you have to write code to make that happen! It doesn't happen by default. Also, you need to take into account the fact that the tests themselves will have some fragility. Tests require time and effort to maintain. This is an area where this notion of the fragility number becomes less accurate. I suspect that since the metric only considers single character mutations it will underestimate the fragility of tests since mutating single characters in tests will automatically cause a build failure.

There seems to be a slightly paradoxical relationship between the fragility number and DRY. Imagine our above JS factorial functions had a test that completely reimplemented factorial and then tried a bunch of random values Quickcheck-style. This would yield a fragility number of zero! Any single character change in the code would cause a test failure. And any single character change in the tests would also cause a test failure. Single character changes can no longer classified fragile because we've violated DRY. You might say that the test suite shouldn't reimplement algorithm--you should just specific cases like f(5) == 120. But in an information theory sense this is still violating DRY.

Does this mean that the fragility number is not very useful? Maybe. I don't know. But I don't think it means that we should just throw away the idea. Maybe we should just keep in mind that this particular formulation doesn't have much to tell us about the fragility more complex coordinated multi-character changes. I could see the usefulness of this metric going either way. It could simplify down to something not very profound. Or it could be that measurements of the fragility of real world software projects end up revealing some interesting insights that are not immediately obvious even from my analysis here.

Whatever the usefulness of this fragility metric, I think the concept gets is thinking about software defects in a different way than we might be used to. If it turns out that my single character mutation model isn't very useful, perhaps the extension to multi-character changes could be useful. Hopefully this will inspire more people to think about these issues and play with the ideas in a way that will help us progress towards more reliable software and tools to build it with.

EDIT: Unsurprisingly, I'm not the first person to have thought of this. It looks like it's commonly known as mutation testing. That Wikipedia article makes it sound like mutation testing is commonly thought of as a way to assess your project's test suite. I'm particularly interested in what it might tell us about programming languages...i.e. how much "testing" we get out of the box because of our choice of programming language and implementation.

Categories: Offsite Blogs

mightybyte: Why version bounds cannot be inferred retroactively (using dates)

Planet Haskell - Mon, 08/22/2016 - 9:35am

In past debates about Haskell's Package Versioning Policy (PVP), some have suggested that package developers don't need to put upper bounds on their version constraints because those bounds can be inferred by looking at what versions were available on the date the package was uploaded. This strategy cannot work in practice, and here's why.

Imagine someone creates a small new package called foo. It's a simple package, say something along the lines of the formattable package that I recently released. One of the dependencies for foo is errors, a popular package supplying frequently used error handling infrastructure. The developer happens to already have errors-1.4.7 installed on their system, so this new package gets built against that version. The author uploads it to hackage on August 16, 2015 with no upper bounds on its dependencies. Let's for simplicity imagine that errors is the only dependency, so the .cabal file looks like this:

name: foo build-depends: errors

If we come back through at some point in the future and try to infer upper bounds by date, we'll see that on August 16, the most recent version of errors was 2.0.0. Here's an abbreviated illustration of the picture we can see from release dates:

If we look only at release dates, and assume that packages were building against the most recent version, we will try to build foo with errors-2.0.0. But that is incorrect! Building foo with errors-2.0.0 will fail because errors had a major breaking change in that version. Bottom line: dates are irrelevant--all that matters is what dependency versions the author happened to be building against! You cannot assume that package authors will always be building against the most recent versions of their dependencies. This is especially true if our developer was using the Haskell Platform or LTS Haskell because those package collections lag the bleeding edge even more. So this scenario is not at all unlikely.

It is also possible for packages to be maintaining multiple major versions simultaneously. Consider large projects like the linux kernel. Developers routinely do maintenance releases on 4.1 and 4.0 even though 4.2 is the latest version. This means that version numbers are not always monotonically increasing as a function of time.

I should also mention another point on the meaning of version bounds. When a package specifies version bounds like this...

name: foo build-depends: errors >= 1.4 && < 1.5

...it is not saying "my package will not work with errors-1.5 and above". It is actually saying, "I warrant that my package does work with those versions of errors (provided errors complies with the PVP)". So the idea that "< 1.5" is a "preemptive upper bound" is wrong. The package author is not preempting anything. Bounds are simply information. The upper and lower bounds are important things that developers need to tell you about their packages to improve the overall health of the ecosystem. Build tools are free to do whatever they want with that information. Indeed, cabal-install has a flag --allow-newer that lets you ignore those upper bounds and step outside the version ranges that the package authors have verified to work.

In summary, the important point here is that you cannot use dates to infer version bounds. You cannot assume that package authors will always be building against the most recent versions of their dependencies. The only reliable thing to do is for the package maintainer to tell you explicitly what versions the package is expected to work with. And that means lower and upper bounds.

Update: Here is a situation that illustrates this point perfectly: cryptonite issue #96. cryptonite-0.19 was released on August 12, 2016. But cryptonite-0.15.1 was released on August 22, 2016. Any library published after August 22, 2016 that depends on cryptonite-0.15.1 would not be able to build if the solver used dates instead of explicit version bounds.

Categories: Offsite Blogs

Brent Yorgey: Academic integrity: context and concrete steps

Planet Haskell - Sun, 08/21/2016 - 5:06pm

Continuing from my previous post, I wanted to write a bit about why I have been thinking about academic integrity, and what, concretely, I plan to do about it.

So, why have I been thinking about this? For one thing, my department had its fair share of academic integrity violations last year. On the one hand, it is right for students to be held accountable for their actions. On the other, in the face of a spate of violations, it is also right for us to reevaluate what we are doing and why, what sort of environmental factors may be pushing students to violate academic integrity, and how we can create a better environment. Environment does not excuse behavior, but it can shape behavior in profound ways.

Another reason for thinking about academic integrity is that starting this fall, I will be a member of the committee that hears and makes a determination in formal academic integrity cases at my institution. It seems no one wants to be on this committee, and to a certain extent I can understand why. But I chose it, for several reasons. For one, I think it is important to have someone on the committee from the natural sciences (I will be the only one), who understands issues of plagiarism in the context of technical subjects. I also care a lot about ensuring that academic integrity violations are handled carefully and thoughtfully, so that students actually learn something from the experience, and more importantly, so that they come through with their sense of belonging intact. When a student (or anyone, really) does something that violates the standards of a community and is subject to consequences, it is all too easy for them to feel as though they are now a lesser member or even excluded from the community. It takes much more intentional communication to make clear to them that although they may have violated a community standard—which necessarily comes with a consequence—they are still a valued member. (Thanks to Leslie Zorwick for explaining about the power of belonging, and for relating recent research showing that communicating belonging can make a big difference for students on academic probation—which seems similar to students accused or convicted of academic integrity violations. I would cite it but I think it is not actually published yet.)

Thinking about all of this is well and good, but what will I do about it? How do I go about communicating all of this to my students, and creating the sort of environment I want? Here are the concrete things I plan to do starting this fall:

  • In all my courses where it makes sense, I plan to require students to have at least one citation (perhaps three, if I am bold) on every assignment turned in—whether they cite web pages, help from TAs or classmates, and so on. The point is to get them thinking regularly about the resources and help that they make use of on every single assignment, to foster a spirit of thankfulness. I hope it will also make it psychologically harder for students to plagiarize and lie about it. Finally, I hope it will lead to better outcomes in cases where a student makes inappropriate use of an online resource—i.e. when they “consult” a resource, perhaps even deceiving themselves into thinking that they are really doing the work, but end up essentially copying the resource. If they don’t cite the resource in such a case, I have a messy academic integrity violation case on my hands; if they do, there is no violation, even though the student didn’t engage with the assignment as I would have hoped, and I can have a simple conversation with them about my expectations and their learning (and perhaps lower their grade).

  • I will make sure to communicate to my students how easy it is for me to detect plagiarism, and how dire the consequences can be. A bit of healthy fear never hurt!

  • But beyond that, I want to make sure my students also understand that I care much more about them, as human beings, than I do about their grade or whether they turn in an assignment. I suspect that a lot of academic integrity violations happen at 2am, the night before a deadline, when the student hasn’t even started the assignment and they are riddled with anxiety and running on little sleep—but they feel as though they have to turn something in and this urge overrides whatever convictions they might have about plagiarism. To the extent their decision is based on anxiety about grades, there’s not much I can do about it. However, if their decision stems from a feeling of shame at not turning something in and disappointing their professor, I can make a difference: in that moment, I want my students to remember that their value in my eyes as human beings is not tied to their academic performance; that I will be much more impressed by their honesty than by whether they turn something in.

  • As a new member of the academic integrity committee, I plan to spend most of my time listening and learning from the continuing members of the committee; but I do hope to make sure our communication with both accused and convicted students emphasizes that they are still valued members of our community.

Other concrete suggestions, questions, experiences to relate, etc. are all most welcome!


Categories: Offsite Blogs

Douglas M. Auclair (geophf): 1Liners for July 2016

Planet Haskell - Sat, 08/20/2016 - 4:12pm

  • July 14th, 2016: So you have x :: [a] in the IO monad, and the function f :: a -> b What is the expression that gets you IO [b]?
Categories: Offsite Blogs

Philip Wadler: Eric Joyce: Why the Brexit vote pushed me to support Scottish independence

Planet Haskell - Fri, 08/19/2016 - 8:09am
Former Labour MP Eric Joyce explains his change of heart.
At the referendum, still an MP, I gave independence very serious thought right up to the close of the vote. I finally came down on the side of No because I thought big EU states with a potential secession issue, like Spain and France, would prevent an independent Scotland joining the EU. This is obviously no longer the case. And I was, like the great majority of the economists and other experts whose opinion I valued, convinced that being outside the EU would be bonkers – it would badly harm our economy and hurt Scots in all sorts of unforeseen ways too.
The Brexit vote reversed that overnight: all of the arguments we in the unionist camp had used were made invalid at worst, questionable at best. This doesn’t mean they were necessarily all wrong. But it does mean that open-minded, rational No voters should at the very least seriously re-consider things in the light of the staggering new context. They should have an open ear to the experts saying that with independence, jobs in Scotland’s financial and legal service sectors will expand as English and international firms look to keep a foothold in the EU.  And to the reasonable prospect of an eventual £50+ oil price might realistically open the way to a final, generational, upswing in employment, and to security for Scotland’s extractive industries and their supply chain. And to the idea that preserving Scotland’s social democracy in the face of the Little Englander mentality of right-wing English Tories might be worth the fight.
Categories: Offsite Blogs

PowerShell is open sourced and is available on Linux

Lambda the Ultimate - Fri, 08/19/2016 - 3:23am

Long HN thread ensues. Many of the comments discuss the benefits/costs of basing pipes on typed objects rather than text streams. As someone who should be inclined in favor of the typed object approach I have to say that I think the text-only folks have the upper hand at the moment. Primary reason is that text as a lingua franca between programs ensures interoperability (and insurance against future changes to underlying object models) and self-documenting code. Clearly the Achilles' heel is parsing/unparsing.

As happens often, one is reminded of the discussions of DSLs and pipelines in Jon Bentley's Programming Pearls...

Categories: Offsite Discussion