News aggregator

Jens Petersen: Stackage LTS 7 is released

Planet Haskell - Wed, 09/14/2016 - 12:01pm
The Stackage curator team is pleased to announce the initial release of Stackage LTS 7 for ghc-8.0.1. Released over 3.5 months after LTS 6.0, the biggest change introduced with LTS 7.0 is the move from ghc-7.10.3 to ghc-8.0.1.  There is also the usual large number of (major) version number bumps: in order to achieve this we used a new one-time pruning step in Stackage Nightly development dropping all packages with constraining Stackage upper-bounds.  Nevertheless LTS 7.0 has 1986 packages which is almost as many as LTS 6.0 with 1994 packages (latest 6.17 at the time of writing has 2002 packages) thanks to all the work of the community of Haskell maintainers. We are going to do the major pruning step earlier for current Stackage Nightly now that LTS 7 is branched from Nightly, which will give package maintainers even more time to get ready for LTS 8.
Categories: Offsite Blogs

The haskell-lang.org team: Updates for September 14, 2016

Planet Haskell - Wed, 09/14/2016 - 10:00am

The biggest update to the site is the addition of three new targeted "next steps" tutorials on the get started page for using Stack, aimed at Play (using the REPL), Script (single-file programs), and Build (full projects). Hopefully this will help people with different goals all get started with Haskell quickly.

In addition, we have included a few new tutorials:

Plus a few other minor edits throughout the site.

The complete diff can be found here.

Categories: Offsite Blogs

Jan Stolarek: Moving to University of Edinburgh

Planet Haskell - Wed, 09/14/2016 - 6:45am

I wanted to let you all know that after working for 8 years as a Lecturer at the Institute of Information Technology (Lodz University of Technology, Poland), I have received a sabbatical leave to focus solely on research. Yesterday I began my work as a Research Associate at the Laboratory for Foundations of Computer Science, University of Edinburgh. This is a two-year post-doc position. I will be part of the team working on the Skye project under supervision of James Cheney. This means that from now on I will mostly focus on developing the Links programming language.

Categories: Offsite Blogs

FP Complete: Working with data in Haskell

Planet Haskell - Wed, 09/14/2016 - 1:00am

In data mining or general exploration, it's common to need to easily access data efficiently and without ceremony. Typically, a programming language will be designed for this case specifically, like R, or a library will be written for it, like Python with the pandas library.

Implementing this in Haskell, we improve upon this area with all the benefits that come with using Haskell over Python or R, such as:

  • Safety - garbage collected memory, no need for pointers.
  • Performance - it's fast: If you write new algorithms in Haskell, they don't to be added to the language (like R) or written in C (like Python).
  • Concurrency - make use of concurrent algorithms trivially, and take advantage of your other CPU cores.
  • Maintainability - static types ensure safety at the time you write the program, and when you come back later to change them, it's harder to break them.

Let's look at an example of doing this in Haskell, and compare with how this is done in Python's pandas. The steps are:

  1. Download a zip file containing a CSV file.
  2. Unzip the file.
  3. Read through the CSV file.
  4. Do some manipulation of the data from the file.

In Haskell we have all the libraries needed (streaming HTTP, CSV parsing, etc.) to achieve this goal, so specifically for this post I've made a wrapper package that brings them together like pandas does. We have some goals:

  • Convenience We don't want to have to write more code than necessary while exploring data.
  • Constant memory Be able to process the file in constant memory space. If I have a 1GB file I don't want to have to load all 1GB into memory in one go.
  • Type-safe I would like that once parsed from CSV, I have a statically-typed data structure with proper types (integers, dates, text, etc.).
Python example

This example code was taken from Modern Pandas. In Python we request the web URL in chunks, which we then write to a file. Next, we unzip the file, and then the data is available as df, with column names downcased.

import zipfile import requests import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt r = requests.get('http://chrisdone.com/ontime.csv.zip', stream=True) with open("flights.csv", 'wb') as f: for chunk in r.iter_content(chunk_size=1024): if chunk: f.write(chunk) zf = zipfile.ZipFile("flights.csv.zip") filename = zf.filelist[0].filename fp = zf.extract(filename) df = pd.read_csv(fp, parse_dates="FL_DATE").rename(columns=str.lower)

Finally, we can look at the 5 rows starting at row 10, for the columns fl_date and tail_num, like this:

df.ix[10:14, ['fl_date', 'tail_num']]

=>

fl_date tail_num 10 2014-01-01 N002AA 11 2014-01-01 N3FXAA 12 2014-01-01 N906EV 13 2014-01-01 N903EV 14 2014-01-01 N903EVPython: good and bad

Good parts of the Python code:

  • Extracting from the Zip file was fairly easy.
  • We were able to specify for some fields to parse them as a different data type (parse_dates).
  • Referencing a range of the data was easy.

Bad parts of the Python code:

  • Reading the HTTP request was very verbose. We manually streamed the chunks to disk, which seems pointless.
  • It's not statically typed. Even though we parsed fl_date and tail_num, we can't be certain down the line if they still exist, or are of the right type.
  • We loaded the whole 100MB CSV into memory.

Let's compare with the solution I prepared in Haskell. While reading, you can also clone the repository that I put together:

$ git clone git@github.com:chrisdone/labels.git --recursive

The wrapper library created for this post is under labels-explore, and all the code samples are under labels-explore/app/Main.hs.

Haskell example

I prepared the module Labels.RunResourceT which provides us with some data manipulation functionality: web requests, unzipping, CSV parsing, etc.

{-# LANGUAGE TypeApplications, OverloadedStrings, OverloadedLabels, TypeOperators, DataKinds, FlexibleContexts #-} import Labels.RunResourceT main = runResourceT $ httpSource "http://chrisdone.com/ontime.csv.zip" responseBody .| zipEntryConduit "ontime.csv" .| fromCsvConduit @("fl_date" := Day, "tail_num" := String) (set #downcase True csv) .| dropConduit 10 .| takeConduit 5 .> tableSink

Output:

fl_date tail_num 2014-01-01 N002AA 2014-01-01 N3FXAA 2014-01-01 N906EV 2014-01-01 N903EV 2014-01-01 N903EV

Breaking this down, the src .| c .| c .> sink can be read like a UNIX pipe src | c | c > sink.

The steps are:

  • Make a web request for the zip file and yield a stream of bytes.
  • Unzip that stream of bytes and yield a stream of bytes of the CSV.
  • Parse from CSV into records of type ("fl_date" := Day, "tail_num" := String).
  • Specify the downcase option so we can deal with lower-case names.
  • Drop the first 10 results.
  • Take 5 of the remaining results.
  • Print the table out.

In this library the naming convention for parts of the pipline is:

  • fooSource -- something which is at the beginning of the pipeline, a source of streaming input.
  • fooConduit -- something which connects two parts of the pipeline together and perhaps does some transformations (such as parsing the Zip, CSV or other things).
  • fooSink -- something into which all streaming input is poured, and produces a final result.
Haskell: good parts

What's good about the Haskell version:

  • Reading the HTTP request was trivial.
  • It's statically typed. If I try to multiply the fl_date as a number, for example, or mistakenly write fl_daet, I'll get a compile error before ever running the program.
  • It achieves it all in a streaming fashion, with constant memory usage.

How is it statically typed? Here:

fromCsvConduit @("fl_date" := Day, "tail_num" := String) csv

We've statically told fromCsvConduit the exact type of record to construct: a record of two fields fl_date and tail_num with types Day and String. Below, we'll look at accessing those fields in an algorithm and demonstrate the safety aspect of this.

Swapping out pipeline parts

We can also easily switch to reading from file. Let's write that URL to disk, uncompressed:

main = runResourceT (httpSource "http://chrisdone.com/ontime.csv.zip" responseBody .| zipEntryConduit "ontime.csv" .> fileSink "ontime.csv")

Now our reading becomes:

main = runResourceT $ fileSource "ontime.csv" .| fromCsvConduit @("fl_date" := Day, "tail_num" := String) (set #downcase True csv) .| dropConduit 10 .| takeConduit 5 .> tableSinkData crunching

It's easy to perform more detailed calculations. For example, to display the number of total flights, and the total distance that would be travelled, we can write:

main = runResourceT $ fileSource "ontime.csv" .| fromCsvConduit @("distance" := Double) (set #downcase True csv) .| sinkConduit (foldSink (\table row -> modify #flights (+ 1) (modify #distance (+ get #distance row) table)) (#flights := (0 :: Int), #distance := 0)) .> tableSink

The output is:

flights distance 471949 372072490.0

Above we made our own sink which consumes all the rows, and then yielded the result of that downstream to the table sink, so that we get the nice table display at the end.

Type correctness

Returning to our safety point, imagine above we made some mistakes.

First mistake, I wrote modify #flights twice by accident:

- modify #flights (+ 1) (modify #distance (+ get #distance row) table)) + modify #flights (+ 1) (modify #flights (+ get #distance row) table))

Before running the program, the following message would be raised by the Haskell type checker:

• Couldn't match type ‘Int’ with ‘Double’ arising from a functional dependency between: constraint ‘Has "flights" Double ("flights" := Int, "distance" := value0)’ arising from a use of ‘modify’

See below for where this information comes from in the code:

main = runResourceT $ fileSource "ontime.csv" .| -- -- The distance field is actually a double -- ↓ -- fromCsvConduit @("distance" := Double) (set #downcase True csv) .| sinkConduit (foldSink (\table row -> modify #flights (+ 1) (modify #flights (+ get #distance row) table)) -- -- But we're trying to modify `#flights`, which is an `Int`. -- ↓ -- (#flights := (0 :: Int), #distance := 0)) .> tableSink

Likewise, if we misspelled #distance as #distant, in our algorithm:

- modify #flights (+ 1) (modify #distance (+ get #distance row) table)) + modify #flights (+ 1) (modify #distance (+ get #distant row) table))

We would get this error message:

No instance for (Has "distant" value0 ("distance" := Double)) arising from a use of ‘get’

Summarizing:

  • The correct values being parsed from the CSV.
  • Fields must exist if we're accessing them.
  • We can't mismatch types.

All this adds up to more maintainable software, and yet we didn't have to state any more than necessary!

Grouping

If instead we'd like to group by a field, in pandas it's like this:

first = df.groupby('airline_id')[['fl_date', 'unique_carrier']].first() first.head()

We simply update the code with the type, putting the additional fields we want to parse:

csv :: Csv ("fl_date" := Day, "tail_num" := String ,"airline_id" := Int, "unique_carrier" := String)

And then our pipeline instead becomes:

fromCsvConduit @("fl_date" := Day, "tail_num" := String, "airline_id" := Int, "unique_carrier" := String) (set #downcase True csv) .| groupConduit #airline_id .| explodeConduit .| projectConduit @("fl_date" := _, "unique_carrier" := _) .| takeConduit 5 .> tableSink
  • We added the two new fields to be parsed.
  • We grouped by the #airline_id field into a stream of lists of rows. That groups the stream [x,y,z,a,b,c] into e.g. [[x,y],[z,a],[b,c]].
  • We explode those groups [[x,y],[z,a],[b,c],...] into a stream of each group's parts: [x,y,z,a,b,c,...].
  • We projected a new record type just for the table display to include fl_date and unique_carrier. The types are to be left as-is, so we use _ to mean "you know what I mean". This is like SELECT fl_date, unique_carrier in SQL.

Output:

unique_carrier fl_date AA 2014-01-01 AA 2014-01-01 EV 2014-01-01 EV 2014-01-01 EV 2014-01-01

The Python blog post states that a further query upon that result,

first.ix[10:15, ['fl_date', 'tail_num']]

yields an unexpected empty data frame, due to strange indexing behaviour of pandas. But ours works out fine, we just drop 10 elements from the input stream and project tail_num instead:

dropConduit 10 .| projectConduit @("fl_date" := _, "tail_num" := _) .| takeConduit 5 .> tableSink

And we get

fl_date tail_num 2014-01-01 N002AA 2014-01-01 N3FXAA 2014-01-01 N906EV 2014-01-01 N903EV 2014-01-01 N903EVConclusion

In this post we've demonstrated:

  1. Concisely handling a chain of problems smoothly like a bash script.
  2. Done all the above in constant memory usage.
  3. Done so with a type-safe parser, specifying our types statically, but without having to declare or name any record type ahead of time.

This has been a demonstration, and not a finished product. Haskell needs work in this area, and the examples in this post are not performant (but could be), but such work would be very fruitful.

Are the advantages of using Haskell something you're interested in? If so, contact us at FP Complete.

Categories: Offsite Blogs

FP Complete: Practical Haskell: Simple File Mirror (Part 1)

Planet Haskell - Wed, 09/14/2016 - 1:00am

The other day I threw together a quick program to solve an annoyance some people on our team were expressing. We sometimes do our development on remote machines, but would still like to use local file editors. There are plenty of solutions for this (SSHFS, inotify+rsync), but none of us ever found a prebaked solution that was low-latency and robust enough to use regularly.

The program I put together is a simple client/server combo, where the client watches for local file changes and sends the updates to the server, which writes them to disk. This approach reduces latency significantly by keeping an open TCP connection, and can be tunneled over SSH for security. It's simple, and the code is short and readable.

This program turned out to be a good demonstration of a few different common problem domains when writing practical, real world Haskell code:

  • Communication protocols
  • Streaming of data
  • Network communications
  • File monitoring (eh, probably not that common, but cool)
  • Command line option parsing

This blog post series will build up from basics to being able to implement a program like the simple file mirror. This first post of three planned posts will cover communication protocols and streaming of data. You can also read:

  • Part 2, on network communication and concurrency

Prereqs

This series will contain a number of self-contained code samples, presented as runnable scripts. In order to run these scripts, copy them into a file ending with .hs (e.g., practical.hs), and then run it with stack practical.hs). You will need to install the Stack build tool. Stack will take care of downloading and installing any compiler and libraries necessary. Because of that: your first time running a script will take a bit of time while downloading and installing.

If you'd like to test that you're ready to proceed, try running this program:

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc main :: IO () main = putStrLn "Hello world!"

Also, this series of posts will use the classy-prelude-conduit library, which provides a lot of built-in functionality to make it easier to follow. If you'd like to kick off a build of that library now so it's available when you want to use it, run:

$ stack --resolver nightly-2016-09-10 --install-ghc build classy-prelude-conduitTextual versus binary data

We humans have an interesting tendancy to want to communicate - with each other and our computers - in human languages. Computers don't care: they just see binary data. Often times, this overlaps just fine, specifically when dealing with the ASCII subset of the character set. But generally speaking, we need some way to distinguish between textual and binary data.

In Haskell, we do this by using two different data types: Text and ByteString. In order to convert between these two, we need to choose a character encoding, and most often we'll choose UTF-8. We can perform this conversion with the encodeUtf8 and decodeUtf8 functions:

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} import ClassyPrelude.Conduit main :: IO () main = do let someText :: Text someText = unlines [ "Hello, this is English." , "Hola, este es el español." , "שלום, זה עברית." ] binary :: ByteString binary = encodeUtf8 someText filePath :: FilePath filePath = "say-hello.txt" writeFile filePath binary binary2 <- readFile filePath :: IO ByteString let text2 = decodeUtf8 binary2 putStrLn text2

If you're coming from a language that doesn't have this strong separation, it may feel clunky at first. But after years of experience with having a type-level distinction between textual and binary data, I can attest to the fact that the compiler has many times prevented me from making some mistakes that would have been terrible in practice. One example popped up when working on the simple-file-mirror tool itself: I tried to send the number of characters in a file path over the wire, instead of sending the number of bytes after encoding it.

The OverloadedStrings language extension lets us use string literals for all of these string-like things, including (as you saw above) file paths.

NOTE: For historical reasons which are being worked on, there is additionally a String type, which is redundant with Text but far less efficient. I mention it because you may see references to it when working with some Haskell documentation. Whenever possible, stick to Text or ByteString, depending on your actual data content.

Some simple data streaming

OK, that was a bit abstract. Let's do something more concrete. We're going to use the conduit library to represent streaming data. To start off simple, let's stream the numbers 1 to 10 and print each of them:

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} import ClassyPrelude.Conduit main :: IO () main = yieldMany [1..10] $$ mapM_C print

Our yieldMany function will take a sequence of values - in this case a list from 1 to 10 - and yield them downstream. For those familiar, this is similar to yielding in Python with generators. The idea is that we will build a pipeline of multiple components, each awaiting for values from upstream and yielding values downstream.

Our mapM_C print component will apply the function print to every value it receives from upstream. The C suffix is used for disambiguating conduit functions from non-conduit functions. Finally, the $$ operator in the middle is the "connect" function, which connects a source of data to a sink of data and runs it. As you might guess, the above prints the numbers 1 to 10.

We can also put other components into our pipeline, including functions that both await values from upstream and yield them downstream. For example:

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} import ClassyPrelude.Conduit main :: IO () main = yieldMany [1..10] $$ mapC (* 10) =$ mapM_C print

The mapC function applies a function to each value in the stream and yields it downstream, while the =$ operator connects two components together. Take a guess at what the output from this will be, and then give it a try.

NOTE There's a subtle but important different between $$ and =$. $$ will connect two components and then run the result to completion to get a result. =$ will connect two components without running them so that it can be further connected to other components. In a single pipeline, you'll end up with one $$ usage.

We can also do more interesting consumption of a stream, like summing:

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} import ClassyPrelude.Conduit main :: IO () main = do total <- yieldMany [1..10] $$ mapC (* 10) =$ sumC print total -- or more explicitly... total <- yieldMany [1..10] $$ mapC (* 10) =$ foldlC (+) 0 print total

Or limit how much of the stream we want to consume:

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} import ClassyPrelude.Conduit main :: IO () main = do total <- yieldMany [1..10] $$ mapC (* 10) =$ takeC 5 =$ sumC print total

This only scratches the surface of what we can do with conduit, but hopefully it gives enough of a basic intuition for the library to get started. If you're interested in diving in deep on the conduit library, check out the previously linked tutorial.

Streaming chunked data

Having a stream of individual bytes turns out to be inefficient in practice. It's much better to chunk a series of bytes into an efficient data structure like a ByteString. Let's see what it looks like to stream data from a file to standard output:

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} import ClassyPrelude.Conduit import qualified System.IO as IO main :: IO () main = do writeFile "some-file.txt" ("This is just some text." :: ByteString) -- Yes, this is clunky. We'll do something much slicker in a bit. IO.withBinaryFile "some-file.txt" IO.ReadMode $ \fileHandle -> sourceHandle fileHandle $$ decodeUtf8C =$ stdoutC

This is good, but what if we want to deal with the individual bytes in the stream instead of the chunks. For example, let's say we want to get just the first 10 bytes of our file. The takeC function we used above would take the first five chunks of data. We instead need a function which will work on the elements of the bytestrings (the individual bytes). Fortunately, conduit provides for this with the E-suffixed element-specific functions:

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} import ClassyPrelude.Conduit import qualified System.IO as IO main :: IO () main = do writeFile "some-file.txt" ("This is just some text." :: ByteString) -- Yes, this is clunky. We'll do something much slicker in a bit. IO.withBinaryFile "some-file.txt" IO.ReadMode $ \fileHandle -> sourceHandle fileHandle $$ takeCE 10 =$ decodeUtf8C =$ stdoutC

In the simple-file-mirror program, we will be sending files over the network, and will need to limit some operations to the actual file sizes in question. Functions like takeCE will be vital for doing this.

Managing resources

While the withBinaryFile approach above worked, it felt kind of clunky. And for more complicated control flows, opening up the file in advance won't be an option (like when we'll only know which file to open after the network connection tells us the file path). To allow for these cases, we're going to introduce the runResourceT, which allows us to acquire resources in an exception-safe manner. Let's rewrite the above example with runResourceT:

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} import ClassyPrelude.Conduit main :: IO () main = do writeFile "some-file.txt" ("This is just some text." :: ByteString) runResourceT $ sourceFile "some-file.txt" $$ takeCE 10 =$ decodeUtf8C =$ stdoutC

Internally, sourceFile uses the bracketP function, which runs some initialization function (in our case, opening a file handle), registers some cleanup function (in our case, closing that file handle), and then performs an action with the resource. To demonstrate what that looks like more explicitly, let's write a modified sourceFile function which will return some default file contents if the requested file can't be read from.

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} import ClassyPrelude.Conduit import qualified System.IO as IO sourceFileDef :: MonadResource m => FilePath -> Source m ByteString sourceFileDef fp = do let -- tryIO catches all IO exceptions and puts them in a Left -- value. open = tryIO $ IO.openBinaryFile fp IO.ReadMode -- Exception occurred, so nothing to close. close (Left e) = putStrLn $ "No file to close, got an exception: " ++ tshow e -- No exception, so close the file handle. close (Right h) = hClose h bracketP open close $ \eitherHandle -> case eitherHandle of Left ex -> do yield "I was unable to open the file in question:\n" yield $ encodeUtf8 $ tshow ex ++ "\n" Right fileHandle -> sourceHandle fileHandle main :: IO () main = runResourceT $ sourceFileDef "some-file.txt" $$ decodeUtf8C =$ stdoutCImplementing our protocol

Let's at least get started on our actual simple-file-mirror code. The wire protocol we're going to use is defined in the README, but we can describe it briefly as:

  • Client sends data to server, server never sends to client
  • A line like 9:hello.txt11:Hello World means "write the file hello.txt with the content Hello World"
  • We provide lengths of both the file paths and the file contents with a decimal-encoded integer followed by a colon (similar to netstrings, but without the trailing comma)
  • If a file has been deleted, we use a special length of -1, e.g. 9:hello.txt-1: means "hello.txt was deleted"

So let's get started with implementing the integer send/receive logic in conduit. There are many ways of doing this, some more efficient than others. For this program, I elected to go with the simplest approach possible (though you can see some historical more complicated/efficient versions). Let's start with sending, which is pretty trivial:

#!/usr/bin/env stack -- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE RankNTypes #-} import ClassyPrelude.Conduit sendInt :: Monad m => Int -> Producer m ByteString sendInt i = yield $ encodeUtf8 $ tshow i ++ ":" main :: IO () main = yieldMany sampleInts $$ awaitForever sendInt =$ stdoutC where sampleInts = [ 1 , 10 , -5 , 0 , 60 ]

Here we've introduced a new function, awaitForever, which repeatedly applies a function as long as data exists on the stream. Take a guess at what the output of this program will be, and then try it out.

Now let's try out the receiving side of this, which is slightly more complicated, but not too bad:

#!/usr/bin/env stack {- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit --package word8 -} {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE RankNTypes #-} import ClassyPrelude.Conduit import Data.Word8 (_colon) sendInt :: Monad m => Int -> Producer m ByteString sendInt i = yield $ encodeUtf8 $ tshow i ++ ":" recvInt :: MonadThrow m => Consumer ByteString m Int recvInt = do intAsText <- takeWhileCE (/= _colon) =$ decodeUtf8C =$ foldC dropCE 1 -- drop the colon from the stream case readMay $ unpack intAsText of Nothing -> error $ "Invalid int: " ++ show intAsText Just i -> return i main :: IO () main = yieldMany sampleInts $$ awaitForever sendInt =$ peekForeverE (do i <- recvInt print i) where sampleInts = [ 1 , 10 , -5 , 0 , 60 ]

peekForeverE is similar to awaitForever, in that it repeatedly performs an action as long as there is data on the stream. However, it's different in that it doesn't grab the data off of the stream itself, leaving it to the action provided to do that, and it deals correctly with chunked data by ignoring empty chunks.

We've also introduced takeWhileCE, which is like takeCE, but instead of giving it a fixed size of the stream to consume, it continues consuming until it finds the given byte. In our case: we consume until we get to a colon. Then we decode into UTF-8 data, and use foldC to concatenate multiple chunks of Text into a single Text value. Then we use readMay to parse the textual value into an Int. (And yes, we could do a much better job at error handling, but using error is the simplest approach.)

Building on top of these two functions, it becomes a lot easier to send and receive complete file paths. Let's put that code in here, together with a test suite to prove it's all working as expected:

#!/usr/bin/env stack {- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit --package word8 --package hspec -} {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE RankNTypes #-} import ClassyPrelude.Conduit import Data.Word8 (_colon) import Test.Hspec import Test.Hspec.QuickCheck (prop) sendInt :: Monad m => Int -> Producer m ByteString sendInt i = yield $ encodeUtf8 $ tshow i ++ ":" sendFilePath :: Monad m => FilePath -> Producer m ByteString sendFilePath fp = do -- UTF-8 encode the filepath let bs = encodeUtf8 $ pack fp :: ByteString -- Send the number of bytes sendInt $ length bs -- Send the actual path yield bs recvInt :: MonadThrow m => Consumer ByteString m Int recvInt = do intAsText <- takeWhileCE (/= _colon) =$ decodeUtf8C =$ foldC dropCE 1 -- drop the colon from the stream case readMay $ unpack intAsText of Nothing -> error $ "Invalid int: " ++ show intAsText Just i -> return i recvFilePath :: MonadThrow m => Consumer ByteString m FilePath recvFilePath = do -- Get the byte count fpLen <- recvInt -- Read in the given number of bytes, decode as UTF-8 text, and -- then fold all of the chunks into a single Text value fpText <- takeCE fpLen =$= decodeUtf8C =$= foldC -- Unpack the text value into a FilePath return $ unpack fpText main :: IO () main = hspec $ do prop "sendInt/recvInt are inverses" $ \i -> do res <- sendInt i $$ recvInt res `shouldBe` i prop "sendFilePath/recvFilePath are inverses" $ \fp -> do res <- sendFilePath fp $$ recvFilePath res `shouldBe` fp

We've used the hspec test framework library and its QuickCheck support to create a test suite which automatically generates test cases based on the types in our program. In this case, it will generate 100 random Ints and 100 random FilePaths each time it's run, and ensure that our properties hold. This is a great way to quickly get significant test coverage for a program.

Sending the files themselves

OK, finally time to put all of this together. We're going to add in some new functions for sending and receiving files themselves. This is a fairly simple composition of all of the work we've done until now. And this is the nice thing about Haskell in general, as well as the conduit library: purity and strong typing often make it possible to trivially combine smaller functions into more complicated beasts. Let's see this all in action:

#!/usr/bin/env stack {- stack --resolver nightly-2016-09-10 --install-ghc runghc --package classy-prelude-conduit --package word8 --package hspec --package temporary -} {-# LANGUAGE NoImplicitPrelude #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE RankNTypes #-} import ClassyPrelude.Conduit import Data.Word8 (_colon) import System.Directory (createDirectoryIfMissing, doesFileExist, removeFile) import System.FilePath (takeDirectory) import System.IO (IOMode (ReadMode), hFileSize, openBinaryFile) import System.IO.Temp (withSystemTempDirectory) import Test.Hspec import Test.Hspec.QuickCheck (prop) sendInt :: Monad m => Int -> Producer m ByteString sendInt i = yield $ encodeUtf8 $ tshow i ++ ":" sendFilePath :: Monad m => FilePath -> Producer m ByteString sendFilePath fp = do -- UTF-8 encode the filepath let bs = encodeUtf8 $ pack fp :: ByteString -- Send the number of bytes sendInt $ length bs -- Send the actual path yield bs sendFile :: MonadResource m => FilePath -- ^ root directory -> FilePath -- ^ path relative to root -> Producer m ByteString sendFile root fp = do -- Send the relative file path, so the other side knows where to -- put the contents sendFilePath fp let open = tryIO $ openBinaryFile fpFull ReadMode -- If the opening failed, we'll have an error message. So -- there's nothing to close, just do nothing! close (Left _err) = return () -- Opening succeeded, so close the file handle. close (Right h) = hClose h -- Grab the file handle... bracketP open close $ \eh -> case eh of -- No file, send a -1 length to indicate file does not -- exist Left _ex -> sendInt (-1) -- File exists Right h -> do -- Send the size of the file size <- liftIO $ hFileSize h sendInt $ fromInteger size -- And stream the contents sourceHandle h where fpFull = root </> fp recvInt :: MonadThrow m => Consumer ByteString m Int recvInt = do intAsText <- takeWhileCE (/= _colon) =$ decodeUtf8C =$ foldC dropCE 1 -- drop the colon from the stream case readMay $ unpack intAsText of Nothing -> error $ "Invalid int: " ++ show intAsText Just i -> return i recvFilePath :: MonadThrow m => Consumer ByteString m FilePath recvFilePath = do -- Get the byte count fpLen <- recvInt -- Read in the given number of bytes, decode as UTF-8 text, and -- then fold all of the chunks into a single Text value fpText <- takeCE fpLen =$= decodeUtf8C =$= foldC -- Unpack the text value into a FilePath return $ unpack fpText recvFile :: MonadResource m => FilePath -- ^ directory to store files in -> Sink ByteString m () recvFile root = do -- Get the relative path to store in fpRel <- recvFilePath -- Prepend with the root directory to get a complete path let fp = root </> fpRel -- Get the size of the file fileLen <- recvInt if fileLen == (-1) -- We use -1 to indicate the file should be removed. Go ahead -- and call removeFile, but ignore any IO exceptions that -- occur when doing so (in case the file doesn't exist -- locally, for example) then liftIO $ void $ tryIO $ removeFile fp else do -- Create the containing directory liftIO $ createDirectoryIfMissing True $ takeDirectory fp -- Stream out the specified number of bytes and write them -- into the file takeCE fileLen =$= sinkFile fp main :: IO () main = hspec $ do prop "sendInt/recvInt are inverses" $ \i -> do res <- sendInt i $$ recvInt res `shouldBe` i prop "sendFilePath/recvFilePath are inverses" $ \fp -> do res <- sendFilePath fp $$ recvFilePath res `shouldBe` fp -- A more standard unit test, checking that sending and receiving -- a file does what is expected. it "create and delete files" $ -- Get temporary source and destination directories withSystemTempDirectory "src" $ \srcDir -> withSystemTempDirectory "dst" $ \dstDir -> do let relPath = "somepath.txt" content = "This is the content of the file" :: ByteString -- Ensure that sending a file that exists makes it appear in -- the destination writeFile (srcDir </> relPath) content runResourceT $ sendFile srcDir relPath $$ recvFile dstDir content' <- readFile (dstDir </> relPath) content' `shouldBe` content -- Ensure that sending a file that doesn't exist makes it -- disappear in the destination removeFile (srcDir </> relPath) runResourceT $ sendFile srcDir relPath $$ recvFile dstDir exists <- doesFileExist (dstDir </> relPath) exists `shouldBe` False

Our sendFile function looks very similar to our sourceFileDef example at the beginning of the post. But instead of streaming a default value, we just send a length of -1, as our protocol dictates. The recvFile function relies heavily on recvFilePath and recvInt. In the case of a -1, it removes the file in question. Otherwise, it creates the containing directory if necessary, and then composes takeCE with sinkFile to stream the correct number of bytes into the file.

We also have a unit test covering the interaction of these two new functions. While some kind of property could perhaps be devised for testing this with QuickCheck, a more standard unit test seemed far more straightforward in this case.

Next time on Practical Haskell

This part of the tutorial covered quite a number of topics, so this is a good place to take a break. Next time, we'll dive into the network communication aspect of things, including:

  • Implementing a really simple HTTP client
  • Implementing an echo server
  • Using some basic concurrency in Haskell to have a client and server in the same process

If you have feedback on how to make this tutorial series more useful, please share it in the comments below, on Twitter (@snoyberg), or in any Reddit discussions about it. I'll try to review feedback and make changes to parts 2 and 3.

Categories: Offsite Blogs

Functional Jobs: Software Engineer/Researcher at Galois Inc (Full-time)

Planet Haskell - Tue, 09/13/2016 - 4:43pm

We are currently seeking software engineers/researchers to play a pivotal role in fulfilling our mission to make critical systems trustworthy.

Galois engineers participate in one or more projects concurrently, and specific roles vary greatly according to skills, interests, and company needs. Your role may include technology research and development, requirements gathering, implementation, testing, formal verification, infrastructure development, project leadership, and/or supporting new business development.

Skills & Requirements

Education – Minimum of a Bachelor’s degree in computer science or equivalent. MS or PhD in CS or a related field desirable but optional, depending on specific role.

Required Technical Expertise – Must have hands-on experience developing software and/or performing computer science research. Demonstrated expertise in aspects of software development mentioned above.

Desired Technical Expertise – Fluency in the use of formal or semi-formal methods such as Haskell or other functional programming languages. Direct experience in developing high assurance systems and/or security products. Experience with identity management, security risk analysis, systems software, or networking.

Required General Skills – Must work well with customers, including building rapport, identifying needs, and communicating with strong written, verbal, and presentation skills. Must be highly motivated and able to self-manage to deadlines and quality goals.

Our engineers develop in programming languages including functional languages, designing and developing advanced technologies for safety- and security-critical systems, networks, and applications. Engineers work in small team settings and must successfully interact with clients, partners, and other employees in a highly cooperative, collaborative, and intellectually challenging environment.

We’re looking for people who can invent, learn, think, and inspire. We reward creativity and thrive on collaboration.

Get information on how to apply for this position.

Categories: Offsite Blogs

Edward Z. Yang: Seize the Means of Production (of APIs)

Planet Haskell - Mon, 09/12/2016 - 11:09pm

There's a shitty API and it's ruining your day. What do you do?

Without making a moral judgment, I want to remark that there is something very different about these two approaches. In Dropbox's case, Dropbox has no (direct) influence on what APIs Apple provides for its operating system. So it has no choice but to work within the boundaries of the existing API. (When Apple says jump, you say, "How high?") But in Adam's case, POSIX is implemented by an open source project Linux, and with some good ideas, Adam could implement his new interface on top of Linux (avoiding the necessity of writing an operating system from scratch.)

APIs cross social boundaries: there is the proletariat that produces software using an API and the bourgeoisie that controls the API. When the Man(TM) is a big corporation, our only choices are to work around their shitty API or pay them sufficiently large amounts of money to fix their shitty APIs. But when the Man(TM) is an open source project, your choices change. True, you could work around their shitty API. Or you could seize the means of production, thereby enabling you to fix the shitty API.

What do I mean by seize the means of production? Indeed, what are the means of production? An open source API does not live in a vacuum; it is made useful by the software that provides the API, the developers who contribute their time and expertise to maintain this expertise, even the publicity platform which convinces people to use an API. To seize the means of production is to gain control of all of these aspects. If you can convince the establishment that you are a core contributor of a piece of open source software, you in turn gain the ability to fix the shitty APIs. If you are unwilling or unable to do so, you might still fork, vendor or rewrite the project, but this is not so much seizing the means of production as it is recreating it from scratch. Another possibility is to build the abstraction you need on top of the existing APIs (as Adam did), though you are always at risk of the original project not being sensitive to your needs.

Time and time again, I see people working with open source projects who refuse to seize the means of production. Instead, they are willing to write increasingly convoluted workarounds to solve problems, all to stay within the boundaries. You might say, "This is just Getting Things Done(TM)", but at some point you will have done more work working around the problem, than you would have spent just fixing the damn thing.

So stop it. Stop putting up with shitty APIs. Stop working within the boundaries. Seize the means of production.

Counterpoint.

  1. What is advocated for in this post is nothing less than infinite yak shaving; if you take the advice seriously you will proceed to never get anything done ever again.
  2. It may be true that in aggregate, the cost of working around a bad API exceeds the cost to fix it, but for any individual the cost is generally less. Thus, even if you could perfectly predict the cost of a workaround versus a proper fix, individual incentives would prevent the proper fix.
  3. Users (including developers) don't know anything about the software they use and are ill-equipped to design better APIs, even if they know where it hurts.
  4. Rarely can you unilaterally seize the means of production. In an ideal world, to become a core contributor, it would merely be sufficient to demonstrate sustained, useful contributions to a project. We all know the real world is more messy.
Categories: Offsite Blogs

Sandy Maguire: Better Data Types a la Carte

Planet Haskell - Mon, 09/12/2016 - 6:00pm
<article> <header> Better Data Types a la Carte </header>

<time>September 13, 2016</time> haskell, rpg, dsl, data types a la carte, semantics

To be honest with you, my approach to procedurally generating RPG stories has been stymied until very recently. Recall the command functor:

data StoryF a = Change Character ChangeType (ChangeResult -> a) | forall x y. Interrupt (Free StoryF x) (Free StoryF y) (y -> a) | -- whatever else

This recursively defined Interrupt command has caused more than its fare share of grief. The idea is that it should represent one potential line of action being interrupted by another. The semantics are rather hazy, but this should result in grafting the Free StoryF y monad somewhere inside of the Free StoryF x monad. Once we’ve done whatever analysis on the original story, we can then forget that the Interrupt bit was ever there in the first place.

In effect, we want this:

data StoryF' a = Change Character ChangeType (ChangeResult -> a) | -- whatever else runInterrupt :: StoryF a -> StoryF' a runInterrupt = -- ???

where runInterrupt’s job is to remove any instances of the Interrupt command from our story – replacing them with the “canonical” version of what actually happened.

Of course, we could just remove all of the Interrupt data constructors from our Free StoryF a object, and rely on convention to keep track of that for us. However, if you’re like me, whenever the phrase “by convention” is used, big alarm bells should go off in your head. Convention isn’t enforced by the compiler, and so anything maintained “by convention” is highly suspect to bugs.

What would make our lives better is if we could define StoryF and StoryF' somehow in terms of one another, so that there’s no hassle to keep them in sync with one another. Even better, in the future, maybe we’ll want to remove or add other constructors as we interpret our story.

What we really want to be able to do is to mix and match individual constructors into one larger data structure, which we can then transform as we see fit.

Fortunately for us, the machinery for this has already been built. It’s Swierstra’s Data Types a la Carte (henceforth DTalC) – essentially a set of combinators capable of composing data types together, and tools for working with them in a manageable way.

Unfortunately for us, Data Types a la Carte isn’t as type-safe as we’d like it to be. Additionally, it’s missing (though not fundamentally) the primitives necessary to remove constructors.1

This post presents a variation of DTalC which is type-safe, and contains the missing machinery.

But first, we’ll discuss DTalC as it is described in the original paper, in order to get a feeling for the approach and where the problems might lie. If you know how DTalC works already, consider skipping to the next heading.

Data Types a la Carte

Data Types a la Carte presents a novel strategy for building data types out of other data types with kind2 * -> *. A code snippet is worth a thousand words, so let’s dive right in. Our StoryF command functor as described above would instead be represented like this:

data ChangeF a = Change Character ChangeType (ChangeResult -> a) data InterruptF a = forall x y. Interrupt (Free StoryF x) (Free StoryF y) (y -> a) type StoryF = ChangeF :+: InterruptF

Here, (:+:) is the type operator which composes data types together into a sum type (there is a corresponding (:*:) for products, but we’re less interested in it today.)

Because the kindedness of (:+:) lines up with that of the data types it combines, we can nest (:+:) arbitrarily deep:

type Something = Maybe :+: Either Int :+: (,) Bool :+: []

In this silly example, Something a might be any of the following:

  • Maybe a
  • Either Int a
  • (Bool, a)
  • [a]

but we can’t be sure which. We will arbitrary decide that (:+:) is right-associative – although it doesn’t matter in principle (sums are monoidal), part of our implementation will depend on this fact.

Given a moment, if you’re familiar with Haskell, you can probably figure out what the machinery must look like:

data (f :+: g) a = InL (f a) | InR (g a) deriving Functor infixr 8 :+:

(:+:) essentially builds a tree of data types, and then you use some combination of InL and InR to find the right part of the tree to use.

However, in practice, this becomes annoyingly painful and tedious; adding new data types can completely shuffle around your internal tree structure, and unless you’re careful, things that used to compile will no longer.

But fear not! Swierstra has got us covered!

class (Functor sub, Functor sup) => sub :<: sup where inj :: sub a -> sup a

This class (and its instances) say that f :<: fs means that the data type f is nestled somewhere inside of the big sum type fs. Furthermore, it gives us a witness to this fact, inj, which lifts our small data type into our larger one. With some clever instances of this typeclass, inj will expand to exactly the right combination of InL and InRs.

These instances are:

instance Functor f => f :<: f where inj = id instance (Functor f, Functor g) => f :<: (f :+: g) where inj = InL instance {-# OVERLAPPABLE #-} (Functor f, Functor g, Functor h, f :<: g) => f :<: (h :+: g) where inj = InR . inj

The first one states “if there’s nowhere left to go, we’re here!”. The second: “if our desired functor is on the left, use InL”. The third is: “otherwise, slap a InR down and keep looking”.

And so, we can now write our smart constructors in the style of:

change :: (ChangeF :<: fs) => Character -> ChangeType -> Free fs ChangeResult change c ct = liftF . inj $ Change c ct id

which will create a Change constructor in any data type which supports it (witnessed by ChangeF :<: fs).

Astute readers will notice immediately that the structural induction carried out by (:<:) won’t actually find the desired functor in any sum tree which isn’t right-associative, since it only ever recurses right. This unfortunate fact means that we must be very careful when defining DTalC in terms of type aliases.

In other words: we must adhere to a strict convention in order to ensure our induction will work correctly.

Better Data Types a la Carte

The problem, of course, is caused by the fact that DTalC can be constructed in ways that the structural induction can’t handle. Let’s fix that by constraining how DTalCs are constructed.

At the same time, we’ll add the missing inverse of inj, namely outj :: (f :<: fs) => fs a -> Maybe (f a)3, which we’ll need later to remove constructors, but isn’t fundamentally restricted in Swiestra’s method.

On the surface, our structural induction problem seems to be that we can only find data types in right-associative trees. But since right-associative trees are isomorphic to lists, the real flaw is that we’re not just using lists in the first place.

With the help of {-# LANGUAGE DataKinds #-}, we can lift lists (among other term-level things) to the type level. Additionally, using {-# LANGUAGE TypeFamilies #-}, we’re able to write type-level functions – functions which operate on and return types!

We define a type class with an associated data family:

class Summable (fs :: [* -> *]) where data Summed fs :: * -> *

Here fs is a type, as is Summed fs. Take notice, however, of the explicit kind annotations: fs is a list of things that look like Functors, and Summed fs looks like one itself.

Even with all of our fancy language extensions, a type class is still just a type class. We need to provide instances of it for it to become useful. The obvious case is if fs is the empty list:

instance Summable '[] where data Summed '[] a = SummedNil Void deriving Functor

The funny apostrophe in '[] indicates that what we’re talking about is an empty type-level list, rather than the type-constructor for lists. The distinction is at the kind level: '[] :: [k] for all kinds k, but [] :: * -> *.

What should happen if we try to join zero data types together? This is obviously crazy, but since we need to define it to be something we make it wrap Void. Since Void doesn’t have any inhabitants at the term-level, it is unconstructible, and thus so too is SummedNil.

But what use case could an unconstructible type possibly have? By itself, nothing, but notice that Either a Void must be Right a, since the Left branch can never be constructed. Now consider that Either a (Either b Void) is isomorphic to Either a b, but has the nice property that its innermost data constructor is always Left (finding the a is Left, and finding b is Right . Left).

Let’s move to the other case for our Summable class – when fs isn’t empty:

instance Summable (f ': fs) where data Summed (f ': fs) a = Here (f a) | Elsewhere (Summed fs a)

Summed for a non-empty list is either Here with the head of the list, or Elsewhere with the tail of the list. For annoying reasons, we need to specify that Summed (f ': fs) is a Functor in a rather obtuse way:

instance Summable (f ': fs) where data Summed (f ': fs) a = Functor f => Here (f a) | Elsewhere (Summed fs a) {-# LANGUAGE StandaloneDeriving #-} deriving instance Functor (Summed fs) => Functor (Summed (f ': fs))

but this now gives us what we want. Summed fs builds a nested sum-type from a type-level list of data types, and enforces (crucially, not by convention) that they form a right-associative list. We now turn our attention to building the inj machinery a la Data Types a la Carte:

class Injectable (f :: * -> *) (fs :: [* -> *]) where inj :: f a -> Summed fs a

We need to write instances for Injectable. Note that there is no instance Injectable '[] fs, since Summable '[] is unconstructible.

instance Functor f => Injectable f (f ': fs) where inj = Here instance {-# OVERLAPPABLE #-} Injectable f fs => Injectable f (g ': fs) where inj = Elsewhere . inj

These instances turn out to be very inspired by the original DTalC. This should come as no surprise, since the problem was with our construction of (:+:) – which we have now fixed – rather than our induction on (:<:).

At this point, we could define an alias between f :<: fs and Injectable f fs, and call it a day with guaranteed correct-by-construction data types a la carte, but we’re not quite done yet.

Remember, the original reason we dived into all of this mumbo jumbo was in order to remove data constructors from our DTalCs. We can’t do that yet, so we’ll need to set out on our own.

We want a function outj :: Summed fs a -> Maybe (f a) which acts as a prism into our a la carte sum types. If our Summed fs a is constructed by a f a, we should get back a Just – otherwise Nothing. We define the following type class:

class Outjectable (f :: * -> *) (fs :: [* -> *]) where outj :: Summed fs a -> Maybe (f a)

with instances that again strongly resemble DTalC:

instance Outjectable f (f ': fs) where outj (Here fa) = Just fa outj (Elsewhere _) = Nothing instance {-# OVERLAPPABLE #-} Outjectable f fs => Outjectable f (g ': fs) where outj (Here _ ) = Nothing outj (Elsewhere fa) = outj fa

The first instance says, “if what I’m looking for is the head of the list, return that.” The other says, “otherwise, recurse on an Elsewhere, or stop on a Here.”

And all that’s left is to package all of these typeclasses into something more easily pushed around:

class ( Summable fs , Injectable f fs , Outjectable f fs , Functor (Summed fs) ) => (f :: * -> *) :<: (fs :: [* -> *]) instance ( Summable fs , Injectable f fs , Outjectable f fs , Functor (Summed fs) ) => (f :<: fs)

This is a trick I learned from Edward Kmett’s great talk on Monad Homomorphisms, in which you build a class that has all of the right constraints, and then list the same constraints for an instance of it. Adding the new class as a constraint automatically brings all of its dependent constraints into scope; f :<: fs thus implies Summable fs, Injectable f fs, Outjectable f fs, and Functor (Summed fs) in a much more terse manner.

As a good measure, I wrote a test that outj is a left-inverse of inj:

injOutj_prop :: forall fs f a. (f :<: fs) => Proxy fs -> f a -> Bool injOutj_prop _ fa = isJust $ (outj (inj fa :: Summed fs a) :: Maybe (f a)) {-# LANGUAGE TypeApplications #-} main = quickCheck (injOutj_prop (Proxy @'[ [] , Proxy , Maybe , (,) Int ]) :: Maybe Int -> Bool)

where we use the Proxy fs to drive type checking for the otherwise hidden fs from the type signature in our property.

And there you have it! Data types a la carte which are guaranteed correct-by-construction, which we can automatically get into and out of. In the next post we’ll look at how rewriting our command functor in terms of DTalC solves all of our Interrupt-related headaches.

A working version of all this code together can be found on my GitHub repository.

  1. EDIT 2016-09-14: After re-reading the paper, it turns out that it describes (though doesn’t implement) this functionality.

  2. For the uninitiated, kinds are to types as types are to values – a kind is the “type” of a type. For example, Functor has kind * -> * because it doesn’t become a real type until you apply a type to it (Functor Int is a type, but Functor isn’t).

  3. I couldn’t resist the fantastic naming opportunity.

</article>
Categories: Offsite Blogs

Ketil Malde: HP EliteBook Folio G1

Planet Haskell - Mon, 09/12/2016 - 4:00am

I got a new laptop the other day. My aging Thinkpad X220 was beginning to wear out, keys were falling off and the case was starting to crack here and there. Sometimes the system log would contain ominous messages about temperatures and such. So it was finally replaced with a new laptop, a HP EliteBook Folio G1.

EliteBook Folio G1 as presented by HP.

This is a Mac-a-like thin little thing, very shiny and sleek compared to the Thinkpad's rough and rugged exterior. They are both "business models", where the Thinkpad has a look that clearly means business, the HP has more the "executive" feel, and there is probably a matching tie pin and cuff links somewhere near the bottom of the very elaborate cardboard package.

I took care to order the model with the non-touch, full-HD display, and not the one with a 4K touch display. I don't really see the point of a touch display on a PC, and I'd rather have the lighter weight, less reflective screen, and longer battery life. And it's only 12 inches or so, how small do pixels need to get, really? Full HD is something like 180 dpi, I'm pretty sure it suffices.

Did I mention shiny? I was surprised, and somewhat disappointed to unpack it and discover that the display is indeed very reflective. It's nice and bright (especially compared to the totally mediocre display on the old Lenovo), but you still need to dress like Steve Jobs (i.e. black sweater) to see anything else than your own reflection. My suspicions were further aroused when I discovered the screen actually was a touch screen, after all. I then weighed the thing, and true enough, it is ten percent heavier than it should be.

After going back and checking the model number (which was correct) and the description at the seller (which was clearly wrong), it was made clear that - although they had given the wrong specifications, a return was not possible. So that's that, I guess, I'm stuck with a heavier laptop with a less usable screen than I thought I would get. So be warned, model has a reflective touch display, even though specs don't say so.

The rest of the hardware is okay, for the most part. It's still fairly light at just a bit over one kg, and apart from its two USB-C ports, the only other connector is an audio jack. The chicklet keyboard is...not very good, compared to the Lenovo's, but I guess I'll learn to accept it. At least it's better than Apple's very shallow version. Oh, and I really like the fanless design. It's not a quiet PC - it's a silent one, and the only moving parts are the keyboard keys.

Apple may have more fanatic users, but they too don't much care for the Mac built-in keyboard.

Installing Linux

As an old-time Linux user, of course I wanted to install my favorite distribution. My previous computer had Debian installed, but I thought I'd try the latest Ubuntu for a (very small) change. Since I am particular in what software I use, and have little use for complex "desktop environments", my usual modus operandi is to install a minimal system -- meaning a Ubuntu server install -- and then pull whatever software I need.

So I copied the server install image to a USB stick (it took me ages of meddling with specialized and buggy USB stick creator software before I realized you can simply dump an ISO directly with something like ), and it happily booted, installed, and rebooted. Only to discover that networking is not supported. Not the built-in wifi, nor the USB-brick's ethernet. Really, Ubuntu?

Microsoft Windows tries to "repair" my Linux install, but gave up after a long period of trying.

Disappointed, I decided to try Debian, which provides a 'netinst' image. Unfortunately, the computer's UEFI copy protection fascist control skynet terminator system kicked in, and Debian images are non grata. So booting into the BIOS, meddle around with UEFI settings and "secure boot", to get this to work. For some reason, I needed to do this several times before it would take, and although I'm pretty sure the BIOS initially identified my USB stick by manufacturer and model, it now gave some generic description. It also wouldn't boot it directly, but there was a "boot from file" option I could use. But of course, installation failed here as well, network configuration didn't work.

Back to Ubuntu, the desktop image this time. And although I still needed to "boot from file" and dig around for an EFI image (or some such), it now managed to configure the network properly, and the install went pretty smoothly from there. I am now running Ubuntu 16.04, and wondering why the server image doesn't come with the same drivers as the desktop image. As long as the network is up, everything else can be fixed - and unless you are installing from a stack of DVDs, something I vaguely remember doing back in a previous century, without networking, you can't get anywhere.

Software

Not being a fan of desktop environments, I prefer to run the Xmonad window manager. After -ting it, I can select it at login, and things work the way they should.

That is: almost.

Since I don't have a graphical tool to inspect or configure networking, I previously used for this. Running this in the background (as root, of course), it reads a config file where all my networks are configured. When it connects and authenticates to a network, I can then run to obtain IP and things like DNS server.

My initial attempt at this config failed, and I tried instead. That didn't work too well last time I tried, which I think is the primary reason I used . The command connects with the NetworkManager daemon, and allows me to connect to networks with something like:

nmcli dev wifi connect my_ssid password my_psk_key

NetworkManager stores this information in , not unlike the config file, and after adding things once, it switches automatically - and unlike , also between other network interfaces. And it uses to proxy DNS queries, also in a smart manner (as far as I can tell).

A new install is also a good time to revisit configurations. I had a setup where XMonad's terminal command would bring up an

uxterm +sb -font '-*-fixed-medium-r-*-*-14-*-*-*-*-*-*-*'

but that font is way too small, and doesn't look very good. After some googling, I ended up with XTerm and specifically:

xterm -fn 7x13 -fa 'Liberation Mono:size=12:antialias=false' +rv

which I think looks pretty good. (Not sure why I have the antialias option, changing it to 'true' doesn't appear to change things, and fonts look to be antialiased regardless)

Docking

I got a docking station as well. Unlike the old Lenovo, where you press the whole PC down in a specialized dock, the new one is just a smallish brick with a bunch of connectors. It hooks up to the PC with a single USB-C cable, which supplies both power, and...well, USB. Running shows some devices, including a virtual disk, an ethernet interface, and something called a DisplayLink.

Bus 004 Device 003: ID 17e9:4354 DisplayLink

Ethernet and USB appears to work out of the box, but the external monitor doesn't show up, and 'xrandr' only displays the built-in display. Googling "displaylink" quickly brings one to http://www.displaylink.com/ which also have downloadable Linux drivers for Ubuntu. Distributed as a self-extracting shell script, haven't seen one of those for a while. First I needed to , to install support for building kernel modules, and after that, the display link driver made the external monitor available to xrandr configuration - and rescaled my screen and cloned it to the external display. So far so good.

There is a downside, however. The external display introduces a noticeable lag, and the DisplayLink daemon often hogs the CPU quite a bit. I also think it incurs a cost to graphically intensive programs, but so far I only noticed that chromium appears more CPU-greedy, so I could be wrong. There's a bunch of "WARNING" dumps in syslog, in addition to messages like this:

[107073.458870] evdi: [W] collapse_dirty_rects:102 Not enough space for clip rects! Rects will be collapsed

so I find it a bit hard to entirely trust this system. But at least it works.

Battery

Battery was supposed to be reasonable. Much of this depends on the CPU being able to enter sleep states. The obvious thing to do is to use 'powertop', go to the rightmost tab, and enable all kinds of hardware and device power saving. But userspace is also important, and I find that my web browser - typically with tens of tabs open on sites peppered with javascript and flash and whatnot - tend to keep the system very busy. Chromium consists of a giant cluster of threads, but as I start it from a terminal, I just bring it to the foreground and hit ctrl-Z.

Having done that, I recheck powertop, and apparently the wifi interface and the bluetooth keep interrupting the system. I don't think the function button for disabling wireless works, but running brings the power use down from 6W (about five hours) to 5.8W (five and a half)1. on the DisplayLink process brought it down to 4.4 (over seven hours), but then bounced back to 5.4 (6 hours). Dimming the backlight gets it down to 4.2W - but, okay, a dark screen and a PC doing nothing - how useful is that? In the end, what matters is how long a flight I can take and still expect to be able to use my computer, and only experience will tell.

And one nice thing about USB-C is that with a universal standard, chances are someone nearby will have a compatible charger - much like for mobile phones these days. And another nice thing is that it is already possible to buy external battery packs with USB-C that will keep your computer juiced up even for really long haul flights.

In the end?

Some things are still not working (e.g. special keys to adjust display brightness or switch off wireless - strangely, the key to dim keyboard backlight works), and other things are funky or unpredictable (e.g. that you must use the "fn" key to use function keys F1 to F12, or the direction two-finger scroll works). But by and large, things work pretty well, really.

  1. And reinserting the iwlwifi module, the device pops back into existence, and after a few seconds, NetworkManager has me reconnected. Nice!

Categories: Offsite Blogs

Michael Snoyman: Monads are like Lannisters

Planet Haskell - Sun, 09/11/2016 - 6:00pm

As many people are likely aware, monads (incorrectly) have a bad rap in the programming community for being difficult to learn. A string of extremely flawed monad tutorials based on analogies eventually led to a blog post by Brent Yorgey about the flaws of this analogy-based approach. And we've seen great learning materials on Haskell and monads.

However, I'm disappointed to see the analogy-based route disappear. Based on a recent Twitter poll I ran, which is held to the highest levels of scientific and statistical scrutiny, it's obvious that there is very high demand for a rigorous analogy-based monad tutorial. I claim that the flaw in all previous analogy based tutorials is lack of strong pop culture references. Therefore, I'm happy to announce the definitive guide to monads: Monads are like Lannisters.

Spoiler alert: if you haven't completed book 5 or season 6 of Haskell yet, there will be spoilers.

Prereqs

The examples below will all be Haskell scripts that can be run with the Stack build tool. Please grab Stack to play along. Copy-paste the full example into a file like foo.hs and then run it with stack foo.hs. (This uses Stack's script interpreter support.)

Hear me roar

Many people believe that the Lannister house words are "A Lannister always pays his debts" (or her debts of course). We'll get to that line in a moment. This belief however is false: the true Lannister house words are Hear me roar. So let's hear a Lannister roar:

#!/usr/bin/env stack -- stack --resolver lts-6.15 --install-ghc runghc roar :: IO () roar = putStrLn "Roar" main :: IO () main = do roar -- Tywin roar -- Cersei roar -- Jaime roar -- Tyrion

Roaring is clearly an output, and therefore it makes sense that our action is an IO action. But roaring doesn't really do much besides making sound, so its return is the empty value (). In our main function, we use do notation to roar multiple times. But we can just as easily use the replicateM_ function to replicate a monadic action multiple times and discard (that's what the _ means) the results:

#!/usr/bin/env stack -- stack --resolver lts-6.15 --install-ghc runghc import Control.Monad (replicateM_) roar :: IO () roar = putStrLn "Roar" main :: IO () main = replicateM_ 4 roarTyrion, the scholar

As we all know, Tyrion is a prolific scholar, consuming essentially any book he can get his hands on (we'll discuss some other consumption next). Fortunately, monads are there to back him up, with the Reader monad. Let's say that Tyrion is doing some late night research on wine production (epic foreshadowment) in various kingdoms, and wants to produce a total:

#!/usr/bin/env stack -- stack --resolver lts-6.15 --install-ghc runghc import Data.Map (Map) import qualified Data.Map as Map import Control.Monad.Trans.Reader import Data.Maybe (fromMaybe) type Kingdom = String type WineCount = Int type WineData = Map Kingdom WineCount tyrionResearch :: Reader WineData Int tyrionResearch = do mnorth <- asks $ Map.lookup "north" mriverlands <- asks $ Map.lookup "riverlands" return $ fromMaybe 0 mnorth + fromMaybe 0 mriverlands main :: IO () main = print $ runReader tyrionResearch $ Map.fromList [ ("north", 5) , ("riverlands", 10) , ("reach", 2000) , ("dorne", 1000) ]

While Tyrion may have chosen inferior kingdoms for wine production, it does not take away from the fact that the Reader type has allowed him access to data without having to explicitly pass it around. For an example this small (no dwarf joke intended), the payoff isn't great. But Reader is one of the simplest monads, and can be quite useful for larger applications.

Tyrion, the drinker

Let's try another one. We all know that Tyrion also likes to drink wine, not just count it. So let's use our Reader monad to give him access to a bottle of wine.

However, unlike reading data from a book, drinking from a bottle actually changes the bottle. So we have to leave our pure Reader world and get into the ReaderT IO world instead. Also known as monad transformers. (Unfortunately I don't have time now for a full post on it, but consider: Transformers: Monads in Disguise.)

#!/usr/bin/env stack -- stack --resolver lts-6.15 --install-ghc runghc import Control.Monad (replicateM_) import Control.Monad.Trans.Reader import Control.Monad.IO.Class import Control.Concurrent.Chan data Wine = Wine type Bottle = Chan Wine drink :: MonadIO m => Bottle -> m () drink bottle = liftIO $ do Wine <- readChan bottle putStrLn "Now I'm slightly drunker" tyrionDrinks :: ReaderT Bottle IO () tyrionDrinks = replicateM_ 10 $ ReaderT drink main :: IO () main = do -- Get a nice new bottle bottle <- newChan -- Fill up the bottle replicateM_ 20 $ writeChan bottle Wine -- CHUG! runReaderT tyrionDrinks bottleA Lannister always pays his debts

What is a debt, but receiving something from another and then returning it? Fortunately, there's a monad for that too: the State monad. It lets us take in some value, and then give it back - perhaps slightly modified.

#!/usr/bin/env stack -- stack --resolver lts-6.15 --install-ghc runghc import Control.Monad.Trans.State type Sword = String forge :: State Sword Sword forge = do "Ice" <- get -- do our Valyrian magic, and... -- Repay our debt to Brienne put "Oathkeeper" -- And Tywin gets a sword too! return "Widows Wail" killNedStark :: IO Sword killNedStark = do putStrLn "Off with his head!" return "Ice" main :: IO () main = do origSword <- killNedStark let (forTywin, forBrienne) = runState forge origSword putStrLn $ "Tywin received: " ++ forTywin putStrLn $ "Jaime gave Brienne: " ++ forBrienne

Not exactly justice, but the types have been satisfied! A monad always pays its debts.

Throwing

Monads are also useful for dealing with exceptional cases:

#!/usr/bin/env stack -- stack --resolver lts-6.15 --install-ghc runghc import Control.Exception import Data.Typeable (Typeable) data JaimeException = ThrowBranFromWindow deriving (Show, Typeable) instance Exception JaimeException jaime :: IO () jaime = do putStrLn "Did anyone see us?" answer <- getLine if answer == "no" then putStrLn "Good" else throwIO ThrowBranFromWindow main :: IO () main = jaimeKilling

And thanks to asynchronous exceptions, you can also kill other threads with monads.

#!/usr/bin/env stack -- stack --resolver lts-6.15 --install-ghc runghc import Control.Concurrent import Control.Exception import Control.Monad import Data.Typeable (Typeable) data JaimeException = Defenestrate deriving (Show, Typeable) instance Exception JaimeException bran :: IO () bran = handle onErr $ forever $ do putStrLn "I'm climbing a wall!" threadDelay 100000 where onErr :: SomeException -> IO () onErr ex = putStrLn $ "Oh no! I've been killed by: " ++ show ex jaime :: ThreadId -> IO () jaime thread = do threadDelay 500000 putStrLn "Oh, he saw us" throwTo thread Defenestrate threadDelay 300000 putStrLn "Problem solved" main :: IO () main = do thread <- forkIO bran jaime thread

Exercise for the reader: modify bran so that he properly recovers from that exception and begins warging instead. (WARNING: don't recover from async exceptions in practice.)

Exiting

You can also exit your entire process with monads.

#!/usr/bin/env stack -- stack --resolver lts-6.15 --install-ghc runghc import System.Exit tommen :: IO () tommen = do putStrLn "Oh, my dear wife!" exitFailure main :: IO () main = tommenPassing the baton of reign

We've seen Lannisters pass the baton of reign from family member to family member. We've also seem them ruthlessly destroy holy insitutions and have their private, internal affairs exposed. As it turns out, we can do all of that with some explicit state token manipulation!

#!/usr/bin/env stack -- stack --resolver lts-6.15 --install-ghc runghc {-# LANGUAGE MagicHash #-} {-# LANGUAGE UnboxedTuples #-} import GHC.Prim import GHC.Types joffrey :: IO () joffrey = do putStrLn "Look at your dead father's head Sansa!" putStrLn "Oh no, poisoned!" tommen :: IO () tommen = do putStrLn "I'm in love!" putStrLn "*Swan dive*" cersei :: IO () cersei = undefined -- season 7 isn't out yet unIO :: IO a -> State# RealWorld -> (# State# RealWorld, a #) unIO (IO f) = f main :: IO () main = IO $ \s0 -> case unIO joffrey s0 of (# s1, () #) -> case unIO tommen s1 of (# s2, () #) -> unIO cersei s2Honorable mentions

There's much more to be done with this topic, and there were other ideas besides Lannisters for this post. To give some mention for other ideas:

  • There's plenty of other Lannister material to work with (skinning animals, Jaime and Cersei affairs, or Tyrion proclivities). But I had to draw the line somewhere (both for length of post and topics I felt like discussing...). Feel free to comment about other ideas.
  • One of my favorites: monads are like onions
  • Monads are just your opinion, man
  • "they're like Gargoyles, they decorate the public facing interface of important landmarks in Haskell-land." link

Personally, I was going to go with a really terrible "monads are like printers" based on them taking plain paper in and spitting out printed pages, but Lannisters was a lot more fun. Also, I'm sure there's something great to be done with presidential candidates, at the very least throwTo opponent Insult.

Disclaimer

Since I'm sure someone is going to get upset about this: YES, this post is meant entirely as parody, and should not be used for actually learning anything except some random details of Game of Thrones, and perhaps a bit of Haskell syntax. Please use the learning materials I linked at the beginning for actually learning about Haskell and monads. And my real advice: don't actually "learn monads," just start using Haskell and you'll pick them up naturally.

Categories: Offsite Blogs

Philip Wadler: Option A vs B: Kicked into the long grass

Planet Haskell - Sun, 09/11/2016 - 1:50pm


I've been putting off posting about the final outcome of the West-East Cycle Route confrontation at Edinburgh City Council's Transport and Environment committee, in part because there was no final outcome.

I sat through four hours of Edinburgh Council's Transport and Environment Committee. The end result was a fudge. The Council heard adamant depositions from both sides, and decided not to decide. They will form a stakeholder group, with representatives from all involved, and attempt to come to a compromise that satisfies all sides. But it's hard to see how that can happen: if we don't get a straight route it will be a disaster, but if we do folk in Roseburn will fear an impact on their businesses (studies show cycle routes don't harm business, but they seem to be taking the Gove line: they are tired of hearing from experts).

Though perhaps we were lucky to get a fudge. Had it gone to a straight vote, following the whip Greens and Labour would have voted for Option A (7 votes) while the SDP, Conservatives, and Liberal Democrats would have voted for Option B (8 votes). As it was, the Greens gave a rousing defense of Option A, with fantastic speeches from Nigel Bagshaw and Chas Booth. Everyone else supported the 'compromise', and took the opportunity to excoriate the Greens for taking a principled stand. Even those who would have voted for Option B expressed support for cycling, and perhaps that is something we can build on.

Dave duFeu of Spokes posted an excellent summary, including links to the webcast of the meeting and a list of the steps you can take to support the West-East Cycle Route.

The Edinburgh Evening News offered coverage in the two days before: Cyclists take to street to support £6m cycle “superhighway” and D-Day for cycleway in Edinburgh amidst anger and division. The first of these features a couple of photos of me arguing with opponents of the cycle route in Roseburn. I don't look calm and collected. Oddly, I can't find any coverage the Evening News gave to the outcome of the Transport and Environment committee meeting. Was there any?

Not only are Roseburn residents up in arms. A similar cycleway in Bearsden, near Glasgow, is attracting comparable ire from its locals. What is our best way forward to fight bicycle bigots?

Previously:   Option A: Think about the children
  Roseburn to Leith Walk A vs B: time to act!
  Ride the Route in support of Option A
Categories: Offsite Blogs

Philip Wadler: What is it like to understand advanced mathematics?

Planet Haskell - Sun, 09/11/2016 - 11:31am

A correspondent on Quora explains the insider's view to an outsider. Some selections:

  • You can answer many seemingly difficult questions quickly. But you are not very impressed by what can look like magic, because you know the trick. The trick is that your brain can quickly decide if a question is answerable by one of a few powerful general purpose "machines"  (e.g., continuity arguments, the correspondences between geometric and algebraic objects, linear algebra, ways to reduce the infinite to the finite through various forms of compactness) combined with specific facts you have learned about your area. The number of fundamental ideas and techniques that people use to solve problems is, perhaps surprisingly, pretty small -- seehttp://www.tricki.org/tri<wbr></wbr>cki/map for a partial list, maintained by Timothy Gowers.
  • You go up in abstraction, "higher and higher". The main object of study yesterday becomes just an example or a tiny part of what you are considering today. For example, in calculus classes you think about functions or curves. In functional analysis or algebraic geometry, you think of spaces whose pointsare functions or curves -- that is, you "zoom out" so that every function is just a point in a space, surrounded by many other "nearby" functions. Using this kind of zooming out technique, you can say very complex things in short sentences -- things that, if unpacked and said at the zoomed-in level, would take up pages. Abstracting and compressing in this way makes it possible to consider extremely complicated issues with one's limited memory and processing power.
  • You are easily annoyed by imprecision in talking about the quantitative or logical. This is mostly because you are trained to quickly think about counterexamples that make an imprecise claim seem obviously false.
Categories: Offsite Blogs

Philip Wadler: ISDS enables corruption

Planet Haskell - Sun, 09/11/2016 - 11:22am

I've long known that ISDS (Investor-State Dispute Settlement) is one of the worst aspects of TTIP, TTP, and CETA. But I've thought the main problem with ISDS was first, that it enabled businessmen to sue governments over laws enacted to save their citizenry (as in the cartoon below), and, second, its secrecy. What I did not understand was how it enables corruption. Kudos to Buzzfeed for their detailed investigation.
Say a nation tries to prosecute a corrupt CEO or ban dangerous pollution. Imagine that a company could turn to this super court and sue the whole country for daring to interfere with its profits, demanding hundreds of millions or even billions of dollars as retribution.

Imagine that this court is so powerful that nations often must heed its rulings as if they came from their own supreme courts, with no meaningful way to appeal. That it operates unconstrained by precedent or any significant public oversight, often keeping its proceedings and sometimes even its decisions secret. That the people who decide its cases are largely elite Western corporate attorneys who have a vested interest in expanding the court’s authority because they profit from it directly, arguing cases one day and then sitting in judgment another. That some of them half-jokingly refer to themselves as “The Club” or “The Mafia.”

And imagine that the penalties this court has imposed have been so crushing — and its decisions so unpredictable — that some nations dare not risk a trial, responding to the mere threat of a lawsuit by offering vast concessions, such as rolling back their own laws or even wiping away the punishments of convicted criminals. This system is already in place, operating behind closed doors in office buildings and conference rooms in cities around the world. Known as investor-state dispute settlement, or ISDS, it is written into a vast network of treaties that govern international trade and investment, including NAFTA and the Trans-Pacific Partnership, which Congress must soon decide whether to ratify.

These trade pacts have become a flashpoint in the US presidential campaign. But an 18-month BuzzFeed News investigation, spanning three continents and involving more than 200 interviews and tens of thousands of documents, many of them previously confidential, has exposed an obscure but immensely consequential feature of these trade treaties, the secret operations of these tribunals, and the ways that business has co-opted them to bring sovereign nations to heel.

The series starts today with perhaps the least known and most jarring revelation: Companies and executives accused or even convicted of crimes have escaped punishment by turning to this special forum. Based on exclusive reporting from the Middle East, Central America, and Asia, BuzzFeed News has found the following:
  • A Dubai real estate mogul and former business partner of Donald Trump was sentenced to prison for collaborating on a deal that would swindle the Egyptian people out of millions of dollars — but then he turned to ISDS and got his prison sentence wiped away.
  • In El Salvador, a court found that a factory had poisoned a village — including dozens of children — with lead, failing for years to take government-ordered steps to prevent the toxic metal from seeping out. But the factory owners’ lawyers used ISDS to help the company dodge a criminal conviction and the responsibility for cleaning up the area and providing needed medical care.
  • Two financiers convicted of embezzling more than $300 million from an Indonesian bank used an ISDS finding to fend off Interpol, shield their assets, and effectively nullify their punishment.

Categories: Offsite Blogs

LHC Team: Haskell Suite: Type inference.

Planet Haskell - Sat, 09/10/2016 - 3:39am
Disclaimer: I am not an academic. The following post is akin to a plumber's guide to brain surgery.

Type inference is complex and has evolved over time. In this post, I will try to explain how I see the landscape and where LHC and other Haskell compilers fit into this landscape.

The beginning: The Hindley–Milner type system. 1982.
The typing rules here are quite simple and every Haskeller seem to learn them intuitively. They include things like: if 'f :: A → B' and 'a :: A' then 'f a :: B'.
In this system, types are always inferred and there must always be a single, most general type for every expression. This becomes a problem when we want higher ranked types because here a single, most general type cannot be inferred. There may be many equally valid type solutions and it has to be up to the programmer to select the appropriate one. But this cannot happen in plain HM as type signatures are only used to make inferred types less general (eg. [a] was inferred but the programmer wanted [Int]).
Omitting the type signature in the following code can show us what plain HM would be like:
<script src="https://gist.github.com/Lemmih/3d9e64eb6b6dd4735e823e95094555e9.js"></script> In GHC, the snippet will run fine with the type signature but not without it.


Version two: Bidirectional type system. 2000.
People realised that checking the correctness of a given type is much easier than inferring a correct type. Armed with this knowledge, a new type checker was born that had two modes usually called 'up' and 'down'. The 'up' mode lifts a new correct type up from an expression and the 'down' mode that checks the correctness of a type. Because of these two modes, this kind of system was called bidirectional and it deals with higher ranked types quite well.
LHC current implements this.

Version three: Boxy types. 2006.
At this point it had become apparent that higher ranked types didn't really play well with higher order functions. People often found themselves in situations where slight, seemingly innocent changes caused the type-checker to reject their programs. An example of this can be seen in this gist:
<script src="https://gist.github.com/Lemmih/52d35e65b915a722bc377165109715ab.js"></script>Impredicative polymorphism is required for the above code and boxy types is a stab in that direction. Bidirectional type checking was a big improvement over plain HM but it lacked granularity. Types are either 100% inferred or 100% checked with no middle ground. What if you wanted to check parts of a type and infer the rest? Well, boxy types solves exactly that problem. Boxes are added (internally, we're not making changes to Haskell here) to types and they signify an unknown that should be inferred. Now parts of types can be checked while the boxes are inferred and we're left with the best of both worlds. This is what JHC implements, btw. Boxy types was also implemented in GHC but was deemed to be too complicated.

Version four: FPH, First-class Polymorphism for Haskell. 2008.
Impredicative polymorphism, second attempt from the authors of boxy types. Improvements were made but the problem is still not solved.

Version five: OutsideIn(X). 2011.
GHC is a hotbed for experimentation in type checkers. GADTs, multi-parameter type classes, type families. These are just some of the features that makes the type-checker the largest and most complicated component of GHC. To deal with all of this, researchers came up with OutsideIn, described in a paper longer than all the previous papers put together. The algorithm is relatively simple, but, for practical reasons, implementations must reject some programs that are valid according to the specification.

Categories: Offsite Blogs

Roman Cheplyaka: A case for static linking in scientific computing

Planet Haskell - Fri, 09/09/2016 - 2:00pm

When researchers run scientific software on high-performance clusters, they often experience problems with shared libraries, such as this one:

bcftools: /lib64/libz.so.1: version `ZLIB_1.2.5.2' not found

Or this one:

eagle: error while loading shared libraries: libhts.so.1: cannot open shared object file: No such file or directory

Popular advice points them in the direction of LD_LIBRARY_PATH, but a simple and robust solution—static linking—is often overlooked.

In this article, I explain the background behind static and dynamic linking, demonstrate the advantages of static linking, address some of the objections against static linking, and give instructions on how to prepare static binaries.

What is static and dynamic linking?

The word linking itself refers to the process of assembling a complete program from libraries of subprograms1.

Static linking occurs as the last stage of the compilation process; the required libraries are embedded in the final binary file of your program.

Some (or even all) of the libraries may not be included in the binary at this stage. In this case, when we attempt to run the program, we need dynamic linking in order to find the libraries and make the missing subroutines accessible to the program. This second type of libraries is called dynamic, or shared, libraries. The files with these libraries are usually named libsomething.so on Linux and something.dll on Windows.

The rules that an operating system follows when it searches for dynamic libraries are complex. And simply having a library in the right place is not enough; it needs to be the same version of the library that was used during the compilation, or at least a different version with the same ABI. (So no, you shouldn’t ln -s /usr/lib/libsomething.so.2 /usr/lib/libsomething.so.1 when a program doesn’t work, contrary to another popular piece of advice one can find on the Internet.)

Linux distributions, most of which dynamically link the software they distribute, manage this by engaging qualified package maintainers and by having tools and centralized infrastructure to build and distribute packages.

But the world of scientific software is not there yet. And if you fail to take care of your dynamic libraries, the result can vary from a program refusing to start (as we saw earlier), to a program crash in the middle of operation, to a hard to detect and diagnose case of data corruption.

This is why I think that scientific software should be linked statically by default.

Advantages of static linking

Reproducibility. When a program is linked statically, it executes the same algorithm wherever it is run. A dynamic executable executes the code from the version of the dynamic library that happens to be installed on a particular computing node.

Note that the static executable doesn’t contain the metainformation about the library versions used to build it. You should record that information when you compile the software, and ideally use a binary repository manager for your builds.

But replacing dynamic linking with static linking by itself dramatically increases the probability that your program will run tomorrow in the same way as it runs today.

Ease of distribution. Suppose that you want your colleague to run the program you have compiled. If you link statically, you only need to distribute a single binary. If you link dynamically, you need to distribute the binary, all dynamic libraries that it depends on, and the instructions on how to make the binary find its libraries.

Portability. Static linking ensures that you can compile the program on your Ubuntu laptop and run it on a Debian or CentOS cluster. With dynamic linking, you’d have to compile it directly on a cluster or in an identical environment.

That said, you still need to ensure that both systems use the same or compatible architectures (the majority of scientific computing happens on x86-64 anyway), the same OS (probably Linux) and not too different kernel versions (or libc versions if you link libc dynamically).

No problems with finding libraries. Since no dynamic libraries are needed, the OS cannot fail to find them. No more cannot open shared object file messages. No more LD_LIBRARY_PATH tricks.

Isn’t static linking considered harmful?

Ulrich Drepper says it is:

There are still too many people out there who think (or even insist) that static linking has benefits. This has never been the case and never will be the case.

Ulrich certainly knows about this stuff much more than I ever hope to. But as you can tell from the above quote, he is sometimes a bit extreme in his judgment.

There is no shortage of knowledgeable people who disagree with him on this issue.

But more importantly, he looks at linking from a very different perspective. For many years, he was employed by Red Hat. He was one of those people who knew a lot about dealing with dynamic libraries and maintained a centralized repository of packages that worked well together in a controlled environment.

It is understandable that he would not care about any of the advantages I list above (though this is different from claiming that there has never been and never will be any benefits to static linking).

But what about the advantages of the dynamic linking that Ulrich describes in his article?

Centralized bug/security fixes.

  1. Security issues matter less for scientific software because it is not exposed to the outside world.

  2. HPC cluster users don’t benefit from centralized bug fixes because usually they don’t have the permissions to install software system-wide. Every user of the same cluster or node would still be responsible for their own updates.

  3. The scale is very different. If you are Red Hat, re-linking hundreds or thousands of binaries every time there is an update in a library is a significant burden. If you are a researcher, you deal maybe with a dozen or two programs, and you may not have to update them often.

  4. Even when centralized updates are possible (e.g. if you can request libraries to be installed centrally and then link against them), scientists would not want them because they are directly at odds with reproducibility.

More efficient use of physical memory through sharing the code.

  1. In high-performance computing, the size of the libraries is usually negligible compared to the size of the data being processed.

  2. When the number of running processes is small, and they don’t have many common dependencies, there’s not much opportunity for sharing.

  3. On the other hand, sometimes multiple copies of the same executable are run in parallel. This happens with software that is not capable of multithreading or cannot exploit it efficiently. Well, in this case, the OS actually can share the code across the processed because it is exactly the same.

  4. When there’s little sharing of code between processes, static linking can sometimes be more memory-efficient. This is because static linking only embeds the object files (i.e. parts of a library) that are actually used by the application, whereas dynamic linking has to load the entire library into memory.

Security measures like load address randomization—see above.

Some features of glibc require dynamic linking. Ulrich, by the way, was one of the core developers of glibc—just in case you were wondering why he considers this a problem of static linking and not a problem of glibc.

Fortunately, most scientific software doesn’t perform character conversions or go to the network. It just crunches numbers. You don’t need dynamic linking for that.

Licensing considerations. I am not a lawyer, but as far as I can tell, this should concern you only if the software is closed-source (or distributed under a license incompatible with GPL) and some of those dependencies are licensed under LGPL. In that case, those dependencies must be linked dynamically, although the other ones can still be linked statically.

I am not sure why Ulrich writes “(L)GPL”, since, to my knowledge, GPL itself does not make a distinction between static and dynamic linking, but I am happy to be corrected.

Tools and hacks like ltrace, LD_PRELOAD, LD_PROFILE, LD_AUDIT don’t work. Oh well.

OK, how do I do this?

Unfortunately, most of the scientific software I come across is linked dynamically by default. Otherwise, I wouldn’t be writing this article.

Convincing the build system

Read the installation instructions. They usually can be found in a file named README, INSTALL, or on the website. If they mention static linking, congratulations.

If not, also try looking inside the Makefile (or whatever build system the software uses). If there is a configure script, try ./configure --help. There could be a target for static linking that the author has not documented.

If the build system doesn’t support static linking out of the box, you will have to modify the linker flags. If a Makefile is well-written, this should be as simple as LDFLAGS=-static make or make LDFLAGS=-static (these are different; try them in this order).

If that doesn’t work, edit the Makefile. Locate the rule that does linking, i.e. produces the final executable.

It usually looks like this:

program: $(OBJS) gcc -o program $(OBJS)

Note that gcc could also be g++, or ld, or hidden behind a variable such as $(CC), $(CXX), or $(LD). The variable $(OBJS) could also be named differently, or be a literal list of .o, .c, or .cpp files. But the program is usually exactly the name of the final program (or, again, a variable that expands to one).

Once you located this rule, try adding a -static flag to the gcc command line:

program: $(OBJS) gcc -static -o program $(OBJS)

In many cases it will be enough to perform static linking.

Sometimes you need to get more creative. One tool, for instance, explicitly built itself as a shared library. This is incompatible with a (global) -static flag set as part of $(LDFLAGS). The way I solved this was to specify a target explicitly, i.e. make prog1 prog2, so that it wouldn’t attempt to build a dynamic library and fail.

Dependencies

In order to statically link a program, you need to have its dependencies available as static libraries. These are files names as libsomething.a.

If this is a library available in your distribution:

  • For Debian and derived distros (e.g. Ubuntu), static libraries usually reside in the libsomething-dev package. If you’ve done dynamic linking of the program, you probably have this package installed already because it also contains the header files.

  • For Red Hat and derived distributions (Fedora, CentOS; not sure about other rpm-based distros), static libraries are often placed in separate packages named libsomething-static.

If this is a third-party library, you’ll need to get a static version of it or compile it from source, following the same instructions that you are reading right now.

Verifying the result

How do you check that you got a static binary? Try running file and ldd on it.

For a dynamic binary, you’ll get something like this:

% ldd ./eagle linux-vdso.so.1 (0x00007ffd47d87000) libhts.so.1 => not found libboost_program_options.so.1.49.0 => not found libboost_iostreams.so.1.49.0 => not found libz.so.1 => /lib64/libz.so.1 (0x00007fe77a445000) libopenblas.so.0 => /lib64/libopenblas.so.0 (0x00007fe778133000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe777f17000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe777b90000) libm.so.6 => /lib64/libm.so.6 (0x00007fe777886000) libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fe777658000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe777441000) libc.so.6 => /lib64/libc.so.6 (0x00007fe77707e000) libgfortran.so.3 => /lib64/libgfortran.so.3 (0x00007fe776d4d000) /lib64/ld-linux-x86-64.so.2 (0x000055f7b3885000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fe776b49000) libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007fe776908000) % file ./eagle ./eagle: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.26, BuildID[sha1]=af18461c835d6f0209754b78c639581c67ed1443, stripped

For a static binary, you’ll see this instead:

% ldd ./Minimac3 not a dynamic executable % file ./Minimac3 ./Minimac3: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.26, BuildID[sha1]=edf443bb3e695b3f421b0d162ca30bb0f422c2ad, stripped

By the way, if file says that your binary is “not stripped”, run the strip command on it:

% strip eagle

This will significantly reduce its on-disk size and make it faster to copy to the cluster.

<section class="footnotes">
  1. To avoid confusion, I should note that I am talking here about software written in compiled languages such as C, C++, or Fortran. I am not talking about interpreted languages (Perl, Python, Ruby) or bytecode-compiled languages (Java, Scala).

</section>
Categories: Offsite Blogs

LHC Team: Haskell Suite: Scoping.

Planet Haskell - Fri, 09/09/2016 - 5:14am
This post answers why I created 'haskell-scope' even though there's already another library that addresses the same problem.

There are two libraries for resolving references in Haskell source code on Hackage: haskell-names and haskell-scope. Of the two, haskell-names is the oldest, the most feature complete, and the most ambitious. It uses a very innovative scheme that allows the scope to be inspected at any point in the syntax tree. You can read more about it in the linked article. Unfortunately, all this innovation comes at a price of complexity.

Here's the complete list of extensions used by haskell-names: CPP, ConstraintKinds, DefaultSignatures, DeriveDataTypeable, DeriveFoldable, DeriveFunctor, DeriveTraversable, FlexibleContexts, FlexibleInstances, FunctionalDependencies, GADTs, GeneralizedNewtypeDeriving, ImplicitParams, KindSignatures, MultiParamTypeClasses, NamedFieldPuns, OverlappingInstances, OverloadedStrings, RankNTypes, ScopedTypeVariables, StandaloneDeriving, TemplateHaskell, TupleSections, TypeFamilies, TypeOperators, UndecidableInstances, and ViewPatterns.

A total of 27 extensions and many of them will never be implemented by LHC. If LHC is to compile itself one day, this obviously won't do. Enter haskell-scope: a library more plain than bread without butter. Give it an AST and it will annotate all the references. Nothing more, nothing less.
Categories: Offsite Blogs

Vincent Hanquez: Foundation

Planet Haskell - Thu, 09/08/2016 - 6:00pm

A new hope. Foundation is a new library that tries to define a new modern Haskell framework. It is also trying to be more than a library: A common place for the community to improve things and define new things

It started as a thought experiment:

What would a modern Haskell base looks like if I could start from scratch ?

What would I need to complete my Haskell projects without falling into traditional pitfalls like inefficient String, all-in-one Num, un-productive packing and unpacking, etc.

One of the constraints, that was set early on, was not depending on any external packages, instead depending only on what GHC provides (itself, base, and libraries like ghc-prim). While it may sound surprising, especially considering the usually high quality and precision of libraries available on hackage, there are many reasons for not depending on anything; I’ll motivate the reason later in the article, so hang on.

A very interesting article from Stephen Diehl on production, that details well some of the pitfalls of Haskell, or the workaround for less than ideal situation/choice, outline pretty well some of the reasons for this effort.

Starting with the basic

One of the few basic things that you’ll find in any modern haskell project, is ByteArray, Array and packed strings. Usually in the form of the bytestring, vector and text packages.

We decided to start here. One of the common problem of those types is their lack of inter-operability. There’s usually a way to convert one into another, but it’s either exposed in an Unsafe or Internal module, or has a scary name like unsafeFromForeignPtr.

Then, if you’re unlucky you will see some issues with unpinned and pinned (probably in production settings to maximise fun); the common ByteString using the pinned memory, ShortByteString and Text using unpinned memory, and Vector, well, it’s complicated (there’s 4 different kind of Vectors).

Note: pinned and unpinned represent whether the memory is allowed to move by the GC. Unpinned usually is better as it allows the memory system to reduce fragmentation, but pinned memory is crucial for dealing with Input/Output with the real world, large data (and some other uses).

Unboxed Array

Our corner stone is the unboxed array. The unboxed array is a native Haskell ByteArray (represented by the ByteArray# primitive type), and it is allowed to be unpinned or pinned (at allocation time). To also support further interesting stuff, we supplement it with another constructor to make it able to support natively a chunk of memory referenced by a pointer.

In simplified terms it looks like:

data UArray ty = UArrayBA (Offset ty) (Size ty) PinnedStatus ByteArray# | UArrayAddr (Offset ty) (Size ty) (Ptr ty)

With this capability, we have the equivalent of ByteString, ShortByteString, Unboxed Vector and (Some) Storable Vector, implemented in one user friendly type. This is a really big win for users, as suddenly all those types play better together; they are all the same thing working the same way.

Instead of differentiating ByteString and Vector, now ByteString disappears completely in favor of just being a UArray Word8. This has been tried before with the current ecosystem with vector-bytestring.

String

String is a big pain point. Base represents it as a list of Char [Char], which as you can imagine is not efficient for most purpose. Text from the popular text package implements a packed version of this, using UTF-16 and unpinned native ByteArray#.

While text is almost a standard in haskell, it’s very likely you’ll need to pack and unpack this representation to interact with base functions, or switch representation often to interact with some libraries.

Note on Unicode: UTF-8 is an encoding format where unicode codepoints are encoded in sequence of 1 to 4 bytes (4 different cases). UTF-16 represent unicode sequences with either 2 or 4 bytes (2 different cases).

Foundation’s String are packed UTF-8 data backed by an unboxed vector of bytes. This means we can offer a lightweight type based on UArray Word8:

newtype String = String (UArray Word8)

So by doing this, we inherit directly all the advantages of our vector types; namely we have a String type that is unpinned or pinned (depending on needs), and supports native pointers. It is extremely lightweight to convert between the two: provided UTF8 binary data, we only validate the data, without re-allocating anything.

There’s no perfect representation of unicode; each representation has it own advantages and disadvantages, and it really depends on what types of data you’re actually processing. One of the easy rules of thumb is that the more your representation has cases, the slower it will be to process the highest unicode sequences.

By extension, it means that choosing a unique representation leads to compromise. In early benchmarks against text we are consistently outperforming Text when the data is predominantly ASCII (i.e. 1-byte encoding). In other type of data, it really depends; sometimes we’re faster still, sometimes slower, and sometimes par.

Caveat emptor: benchmarks are far from reliable, and only been run on 2 machines with similar characteristic so far.

Other Types

We also support already:

  • Boxed Array. This is an array to any other Haskell types. Think of it as array of pointers to another Haskell value
  • Bitmap. 1 bit packed unboxed array

In the short term, we expect to add:

  • tree like structure.
  • hash based structure.
Unified Collection API

Many types of collections support the same kind of operations.

For example, commonly you have very similar functions defined with different types:

take :: Int -> UArray a -> UArray a take :: Int -> [a] -> [a] take :: Int -> String -> String head :: UArray a -> a head :: String -> Char head :: [a] -> a

So we tried to avoid monomorphic versions of common Haskell functions and instead provide type family infused versions of those functions. In foundation we have:

take :: Int -> collection -> collection head :: collection -> Element collection

Note: Element collection is a type family. this allow from a type “collection” to define another type. for example, the Element of a [a] is a, and the Element of String is Char.

Note2: head is not exactly defined this way in foundation: This was the simplest example that show Type families in action and the overloading. foundation’s head is not partial and defined: head :: NonEmpty collection -> Element collection

The consequence is that the same head or take (or other generic functions) works the same way for many different collection types, even when they are monomorphic (e.g. String).

For another good example of this approach being taken, have a look at the mono-traversable package

For other operations that are specific to a data structure, and hard to generalize, we still expose dedicated operations.

The question of dependencies

If you’re not convinced by how we provide a better foundation to the standard Haskell types, then it raises the question: why not depend on those high quality libraries doing the exact same thing ?

Consistency. I think it’s easier to set a common direction, and have a consistent approach when working in a central place together, than having N maintainers working on M packages independently.

Common place. An example speaks better than words sometime: I have this X thing, that depends on the A package, and the B package. Should I add it to A, to B, or create a new C package ?

Atomic development. We don’t have to jump through hoops to improve our types or functions against other part of foundation. Having more things defined in a same place, means we can be more aggressive about improving things faster, while retaining an overall package that make sense.

Versions, and Releases. Far easier to depends on a small set of library than depends on hundreds of different versions. Particularly in an industrial settings, I will be much more confident tracking 1 package, watch 1 issue tracker and deal with a set of known people, than having to deal with N packages, N issues trackers (possibly in different places), and N maintainers.

Some final notes

A fast iterative release schedule. Planning to release early, release often. and with a predictable release schedule.

We’re still in the early stage. While we’re at an exciting place, don’t expect a finish product right now.

You don’t need to be an expert to help. anyone can help us shape foundation.

Join us. If you want to get involved: all Foundation works take place in the open, on the haskell-foundation organisation with code, proposals, issues and voting, questions.

Categories: Offsite Blogs

Theory Lunch (Institute of Cybernetics, Tallinn): Nonuniversality in computation: A proof by semantic shift?

Planet Haskell - Thu, 09/08/2016 - 8:59am

Today, the 8th of September 2016, we had a very interesting discussion about a theorem, due to Selim G. Akl, pointed to me in a tweet by Andy Adamatzky. Such theorem has, according to Akl, the consequence that the Church-Turing thesis, a basic tenet of theoretical computer science, is false. Of course, surprising statements require solid arguments: is Akl’s solid enough?

First of all, let us recall what the Church-Turing thesis is, and what it is not. Its statement, as reported by the Stanford Encyclopedia of Philosophy, goes as follows:

A function of positive integers is effectively calculable only if recursive.

Here, for a calculation procedure to be “effective” means the following:

  1. it has a finite description;
  2. it always returns the correct output, given any valid input;
  3. it can be “carried on by pencil and paper” by a human being; and
  4. it requires no insight or ingenuity on the human’s behalf.

One model of effective procedures is given by the recursive functions; another one, by the functions computable by Turing machines; a third one, by the functions which are representable in Church’s -calculus. Alan Turing and Stephen Cole Kleene proved that the three classes coincide: thus, in ordinary practice, the Church-Turing thesis is often stated with “Turing-computable” in place of “recursive”.

The class of Turing machines has the advantage of containing a universal element: a special Turing machine and an encoding from the set of Turing machines to the set of natural numbers exists such that, when the special Turing machine is provided the encoding of an arbitrary Turing machine and a valid input for the latter, it will return the value of the encoded Turing machine on the provided input.

Now that we have written down what the Church-Turing thesis is, we can examine Akl’s theorem.

In his 2005 paper, Akl defines a universal computer as a system having the following features:

  1. It has means of communicating with the outside world, so to receive input, and where to send its output.
  2. It can perform every elementary arithmetic and logic operations.
  3. It can be programmed, according to the two previous rules.
  4. It has unlimited memory to use for input, output, and temporary values.
  5. It can only execute finitely many operations (evaluating input, producing output, performing an elementary operation, etc.) at each time step.
  6. It can simulate any computation performed by any other model of computation.

The statement of the theorem, which does not appear explicitly in the original paper but is written down in the one from 2015 which clarifies the idea and addresses criticism, is hereby reported verbatim:

Nonuniversality in Computation Theorem (NCT): No computer is universal if it is capable of exactly operations during time unit of computation, where is a positive integer, and is finite and fixed once and for all.

The main argument is that no such computer can perform a computation which requires more than operations at some time . Explicit examples happen in parallel computation, a field Akl is a master of, where the number of operations that can be performed in a time unit grows linearly with the number of processors: for instance, reading values in input can be done in time by a parallel machine with processors, but not by any machine with processors.

Such requirement, however, does not appear in the notion of universality at the base of the original, and actual, Church-Turing thesis. There, to “simulate” a machine or algorithm means to be able of always reproducing the same output of the algorithm, given any valid input for it, up to an encoding of the input and the output. But no hypothesis on how the output is achieved from the input is made: a simulation in linear time, such that each step of the simulated algorithm is reproduced by exactly operations of the Turing machine, is as good as one where the simulation of the th step takes operations from the Turing machine, or where no such regularity appears.

Among the (counter)examples provided by Akl are:

  1. Computations with time-varying variables.
  2. Computations with time-varying computational complexity.
  3. Computations whose complexity depends on their placement on a schedule.
  4. Computations with interacting variables, e.g., states of entangled electrons.
  5. Computations with uncertain time constraints.

None of these, however, respect the definition of computation from the model of recursive functions: where the values of the variables are given once and for all, and can possibly change for recursive calls, but not for the original call. They can be seen as instances of unconventional models of computation: but by doing this, one changes the very notion of computation, which ceases to be the one at the basis of the Church-Turing thesis.

So my guess is that Akl’s statement about the falsity of the Church-Turing thesis actually falls in the following category, as reported in the humorous list by Dana Angluin:

Proof by semantic shift: Some standard but inconvenient definitions are changed for the statement of the result.

Actually, if we go back to Akl’s definition of a universal computer, it appears to be fine until the very last: the first two points agree with the definition of effective computation at the basis of the actual Church-Turing thesis, the next three are features of any universal Turing machine. The problem comes from the last point, which has at least two weak spots: the first one being that it does not define precisely what a model of computation is, which can be accepted as Akl is talking of unconventional computation, and it is wiser to be open to other possibilities. But there is a more serious one, in that it is not clear

what does the expression “to simulate” mean.

Note that the Stanford Encyclopedia of Philosophy reports the following variant of the Church-Turing thesis, attributed to David Deutsch:

Every finitely realizable physical system can be perfectly simulated by a universal model computing machine operating by finite means.

Deutsch’s thesis, however, does not coincide with the Church-Turing thesis! (This, notwithstanding Deutsch’s statement that “[t]his formulation is both better defined and more physical than Turing’s own way of expressing it”.) Plus, there is another serious ambiguity, which is of the same kind as the one in Akl’s definition:

what is “perfectly simulated” supposed to mean?

Does it mean that every single step performed by the system can be reproduced in real time? In this case, Akl is perfectly right in disproving it under the constraint of boundedly many operations at each time unit. Or does it mean that the simulation of each elementary step of the process (e.g., one performed in a quantum of time) ends with the correct result if the correct initial conditions are given? In this case, the requirement to reproduce exactly what happens between the reading of the input and the writing of the output is null and void.

Worse still, there is a vulgarized form of the Church-Turing thesis, which is reported by Akl himself on page 172 of his 2005 paper!, and goes as follows:

Any computable function can be computed on a Turing machine.

If one calls that “the Church-Turing thesis”, then Akl’s NCT is absolutely correct in disproving it. But that is not the actual Church-Turing thesis! It is actually a rewording of what in the Stanford Encyclopedia of Philosophy is called “Thesis M”, and explicitly stated not to be equivalent to the original Church-Turing thesis—and also false. Again, the careful reader will have noticed that, in the statement above, being “computable by a Turing machine” is a well defined property, but “computable” tout court definitely not so.

At the end of this discussion, my thesis is that Akl’s proof is correct, but NCT’s consequences and interpretation might not be what Akl means, or (inclusive disjunction) what his critics understand. As for my personal interpretation of NCT, here it goes:

No computer which is able to perform a predefinite, finite number of operations at each finite time step, is universal across all the different models of computation, where the word “computation” may be taken in a different meaning than that of the Church-Turing thesis.

Is mine an interpretation by semantic shift? Discussion is welcome.

References:

  1. Selim G. Akl. The Myth of Universal Computation. Parallel Numerics ’05, 167–192.
  2. Selim G. Akl. Nonuniversality explained. International Journal of Parallel, Emergent and Distributed Systems 31:3, 201–219. doi:10.1080/17445760.2015.1079321
  3. The Church-Turing Thesis. Stanford Encyclopedia of Philosophy. First published January 8, 1997; substantive revision August 19, 2002. http://plato.stanford.edu/entries/church-turing/
  4. Dana Angluin’s List of Proof Techniques. http://www.cs.northwestern.edu/~riesbeck/proofs.html

Categories: Offsite Blogs

NOOL 2016

Lambda the Ultimate - Wed, 09/07/2016 - 2:54am

At this year's SPLASH there will be a follow-up to last year's NOOL 2015 workshop (discussed here on LtU). Objects may not be the trendiest thing in PL circles, but I think they have a bright future. (Example: session types are essentially concurrent objects in disguise, and they are quite trendy.)

Please consider submitting! (Disclaimer: I'm one of the organisers :)

Categories: Offsite Discussion

Edward Z. Yang: The Base of a String Theory for Haskell

Planet Haskell - Wed, 09/07/2016 - 12:22am

One of the early posts from this blog, from 2010, was on the subject of how to pick your string library in Haskell. Half a decade later, the Haskell ecosystem is still largely in the same situation as it was half a decade ago, where most of the boot libraries shipped with GHC (e.g., base) still use the String type, despite the existence of superior string types. The problem is twofold:

  1. No one wants to break all of the existing code, which means libraries like base have to keep String versions of all their code. You can't just search-replace every occurrence of String with Text.
  2. No one wants to be in the business of maintaining two copies of any piece of code, which are copy-pastes of each other but subtly different. In practice, we must: e.g., unix has ByteString variants of all of its functions (done by copy-paste); text provides some core IO functionality (also done by copy-paste). But it is terrible and scales poorly: every downstream library that wants to support two string types (or more) now has to publish two copies of themselves, and any new string implementation has the unenviable task of reimplementing the world to make themselves useful.

Backpack solves these problems, by allowing you to parametrize over a signature rather than a concrete implementation of a string type, and instantiate such an indefinite library whenever you want. This solves both problems:

  1. Because you are allowed to instantiate an indefinite library whenever you want, we can eagerly instantiate a posix-indef using String and ship it as posix, keeping backwards compatibility with all packages which are Backpack ignorant.
  2. At the same time, if packages depend directly on posix-indef, they themselves are parametrizable over a string type. Entire library ecosystems can defer the choice of string type to the end user, which on a sufficiently new version of GHC offers an backwards compatible way of adding support for new string types to a library. (I don't want to say, support multiple string types, because this is not necessarily a virtue in-and-of-itself.)

To this end, I would like to propose a string theory, for the base of GHC Haskell: namely the core boot libraries that are distributed with GHC today. These packages will set the tone for the eventual Backpackification of the rest of the ecosystem.

But first, what is it that we are parametrizing over? A string is not so simple...

A digression on file paths (and OS strings)

File paths (FilePath) are an important form of String which aren't really Unicode strings at all. POSIX specifies that file paths can be arbitrary C strings, thus, code which decodes a file path as Unicode must be cognizant of the fact that the underlying ByteString could contain arbitrary, undecodable nonsense. To make matters worse, even the encoding can vary: on Windows file paths are encoded in UTF-16 (with unpaired surrogates, eek!), while in modern Linux environments the encoding is dictated by the locale (base uses locale_charset to determine how to interpret file paths; the locale is often UTF-8, but not always).

Thus, the definition type FilePath = String is very questionable indeed. There is an existing proposal, the Abstract FilePath Proposal to turn FilePath into an abstract type, and not just a type synonym for String. Unfortunately, a change like this is a BC-breaking one, so it will take some time to implement, since GHC must first be taught to warn when FilePath is used as if it were a String, to help people find out that they are using it incorrectly.

Backpack offers a more decentralized way to move into the future: just define an abstract signature for FilePath to depend upon. The low level signature might look like this:

signature FilePath where -- | File and directory names, whose precise -- meaning is operating system dependent. Files can be opened, yielding a -- handle which can then be used to operate on the contents of that file. data FilePath -- | A C string (pointer to an array of C characters terminated by NUL) -- representing a file path, suitable for use with the operating system -- C interface for file manipulation. This exact type is architecture -- dependent. type CFilePath = #ifdef mingw32_HOST_OS CWString #else CString #endif withFilePath :: FilePath -> (CFilePath -> IO a) -> IO a newFilePath :: FilePath -> IO CFilePath peekFilePath :: CFilePath -> IO FilePath -- peekFilePath >=> newFilePath should be identity -- (this is tricky to achieve if FilePath is a -- Unicode-based type, like String)

And of course, you would want all of the FilePath manipulation functions that people use.

To maintain compatibility with the existing ecosystem, you would likely instantiate your library with type FilePath = String. But there is nothing stopping you from picking your own abstract FilePath type and using it instead.

File paths are not unique in this sense; there are other strings (such as the values of environment variables) which have similar properties: I've taken to calling these OSStrings (as they are called in Rust.)

Axes of parametrization

With this in mind, there are three "string variants" any given library can be parametrized:

  1. They can be parametrized over FilePath, for modules which deal with the file system (e.g., System.Posix.Directory)
  2. They can be parametrized over an OSString, because they deal with various operating system specific APIs (e.g., System.Posix.Env)
  3. They can be parametrized over a String, because, well, sometimes a string is just a string. (e.g., Text.ParserCombinators.ReadP)

Some libraries may be parametrized in multiple ways: for example, readFile needs to be parametrized over both FilePath and String.

Split base (and friends) for Backpack

For technical reasons, Backpack cannot be used to parametrize specific modules; you have to parametrize over an entire library. So a side-effect of Backpack-ing the core libraries is that they will be split into a number of smaller libraries. Using module reexports, you can still keep the old libraries around as shims.

There are four GHC boot libraries which would most benefit from modularization on strings:

  • base
    • base-io (System.IO and submodules; parametrized over FilePath and String)
    • There are a few other modules which could be stringified, but the marginal benefit may not justify making a new package for each (Data.String, System.Console.GetOpt, Text.ParserCombinators.ReadP, Text.Printf). Each of these only needs to be parametrized over String.
    • Control.Exception, Text.Read and Text.Show are explicit non-goals, they are too deeply wired into GHC at present to muck about with.
  • unix
    • unix-env (System.Posix.Env, parametrized over OSString)
    • unix-fs (System.Posix.Directory, System.Posix.Files, System.Posix.Temp parametrized over FilePath)
    • unix-process (System.Posix.Process, parametrized over FilePath and OSString)
  • pretty (parametrized over String; then GHC could use it rather than roll its own copy!)
  • process (parametrized over String, OSString and FilePath)

The naming scheme I propose is that, e.g., the package unix continues to be the package instantiated with old-fashioned Strings. Then unix-indef is a package which is uninstantiated (the user can instantiate it to what they want, or pass on the decision to their users). Some packages may choose to also provide shims of their package instantiated with specific types, e.g., base-io-bytestring, which would be base-io instantiated with ByteString rather than String, though these names could get to be quite long, so it's uncertain how useful this would be.

Closing remarks

Of all the packages mentioned here, only base could make the bold step of using Backpack next GHC release (although it won't; at least, not for GHC 8.2); the rest need to maintain backwards compatibility with old versions of GHC and so would have to be forked to use Backpack.

The real test for Backpack will be whether or not string-using packages in the ecosystem decide to sign on, and parametrize themselves over signatures. I hope that eventually, you can use any library with ByteString or Text with the same ease that you can use libraries with String (and maybe even use your own, home-grown type.) The problem with module systems is that you rarely see the benefits until you use them for big systems, but that makes it difficult to evaluate them before hand. But the benefits seem tantalizing: let's boldly backpack forth to a brighter future!

Categories: Offsite Blogs