News aggregator

Algorithm W Step by Step [PDF]

Haskell on Reddit - Tue, 05/19/2015 - 10:08am
Categories: Incoming News

Gabriel Gonzalez: Morte: an intermediate language for super-optimizing functional programs

Planet Haskell - Tue, 05/19/2015 - 8:06am

The Haskell language provides the following guarantee (with caveats): if two programs are equal according to equational reasoning then they will behave the same. On the other hand, Haskell does not guarantee that equal programs will generate identical performance. Consequently, Haskell library writers must employ rewrite rules to ensure that their abstractions do not interfere with performance.

Now suppose there were a hypothetical language with a stronger guarantee: if two programs are equal then they generate identical executables. Such a language would be immune to abstraction: no matter how many layers of indirection you might add the binary size and runtime performance would be unaffected.

Here I will introduce such an intermediate language named Morte that obeys this stronger guarantee. I have not yet implemented a back-end code generator for Morte, but I wanted to pause to share what I have completed so far because Morte uses several tricks from computer science that I believe deserve more attention.

Morte is nothing more than a bare-bones implementation of the calculus of constructions, which is a specific type of lambda calculus. The only novelty is how I intend to use this lambda calculus: as a super-optimizer.

Normalization

The typed lambda calculus possesses a useful property: every term in the lambda calculus has a unique normal form if you beta-reduce everything. If you're new to lambda calculus, normalizing an expression equates to indiscriminately inlining every function call.

What if we built a programming language whose intermediate language was lambda calculus? What if optimization was just normalization of lambda terms (i.e. indiscriminate inlining)? If so, then we would could abstract freely, knowing that while compile times might increase, our final executable would never change.

Recursion

Normally you would not want to inline everything because infinitely recursive functions would become infinitely large expressions. Fortunately, we can often translate recursive code to non-recursive code!

I'll demonstrate this trick first in Haskell and then in Morte. Let's begin from the following recursive List type along with a recursive map function over lists:

import Prelude hiding (map, foldr)

data List a = Cons a (List a) | Nil

example :: List Int
example = Cons 1 (Cons 2 (Cons 3 Nil))

map :: (a -> b) -> List a -> List b
map f Nil = Nil
map f (Cons a l) = Cons (f a) (map f l)

-- Argument order intentionally switched
foldr :: List a -> (a -> x -> x) -> x -> x
foldr Nil c n = n
foldr (Cons a l) c n = c a (foldr l c n)

result :: Int
result = foldr (map (+1) example) (+) 0

-- result = 9

Now imagine that we disable all recursion in Haskell: no more recursive types and no more recursive functions. Now we must reject the above program because:

  • the List data type definition recursively refers to itself

  • the map and foldr functions recursively refer to themselves

Can we still encode lists in a non-recursive dialect of Haskell?

Yes, we can!

-- This is a valid Haskell program

{-# LANGUAGE RankNTypes #-}

import Prelude hiding (map, foldr)

type List a = forall x . (a -> x -> x) -> x -> x

example :: List Int
example = \cons nil -> cons 1 (cons 2 (cons 3 nil))

map :: (a -> b) -> List a -> List b
map f l = \cons nil -> l (\a x -> cons (f a) x) nil

foldr :: List a -> (a -> x -> x) -> x -> x
foldr l = l

result :: Int
result = foldr (map (+ 1) example) (+) 0

-- result = 9

Carefully note that:

  • List is no longer defined recursively in terms of itself

  • map and foldr are no longer defined recursively in terms of themselves

Yet, we somehow managed to build a list, map a function over the list, and fold the list, all without ever using recursion! We do this by encoding the list as a fold, which is why foldr became the identity function.

This trick works for more than just lists. You can take any recursive data type and mechanically transform the type into a fold and transform functions on the type into functions on folds. If you want to learn more about this trick, the specific name for it is "Boehm-Berarducci encoding". If you are curious, this in turn is equivalent to an even more general concept from category theory known as "F-algebras", which let you encode inductive things in a non-inductive way.

Non-recursive code greatly simplifies equational reasoning. For example, we can easily prove that we can optimize map id l to l:

map id l

-- Inline: map f l = \cons nil -> l (\a x -> cons (f a) x) nil
= \cons nil -> l (\a x -> cons (id a) x) nil

-- Inline: id x = x
= \cons nil -> l (\a x -> cons a x) nil

-- Eta-reduce
= \cons nil -> l cons nil

-- Eta-reduce
= l

Note that we did not need to use induction to prove this optimization because map is no longer recursive. The optimization became downright trivial, so trivial that we can automate it!

Morte optimizes programs using this same simple scheme:

  • Beta-reduce everything (equivalent to inlining)
  • Eta-reduce everything

To illustrate this, I will desugar our high-level Haskell code to the calculus of constructions. This desugaring process is currently manual (and tedious), but I plan to automate this, too, by providing a front-end high-level language similar to Haskell that compiles to Morte:

-- mapid.mt

( \(List : * -> *)
-> \( map
: forall (a : *)
-> forall (b : *)
-> (a -> b) -> List a -> List b
)
-> \(id : forall (a : *) -> a -> a)

-> \(a : *) -> map a a (id a)
)

-- List
(\(a : *) -> forall (x : *) -> (a -> x -> x) -> x -> x)

-- map
( \(a : *)
-> \(b : *)
-> \(f : a -> b)
-> \(l : forall (x : *) -> (a -> x -> x) -> x -> x)
-> \(x : *)
-> \(Cons : b -> x -> x)
-> \(Nil: x)
-> l x (\(va : a) -> \(vx : x) -> Cons (f va) vx) Nil
)

-- id
(\(a : *) -> \(va : a) -> va)

This line of code is the "business end" of the program:

\(a : *) -> map a a (id a)

The extra 'a' business is because in any polymorphic lambda calculus you explicitly accept polymorphic types as arguments and specialize functions by applying them to types. Higher-level functional languages like Haskell or ML use type inference to automatically infer and supply type arguments when possible.

We can compile this program using the morte executable, which accepts a Morte program on stdin, outputs the program's type stderr, and outputs the optimized program on stdout:

$ morte < id.mt
∀(a : *) → (∀(x : *) → (a → x → x) → x → x) → ∀(x : *) → (a
→ x → x) → x → x

λ(a : *) → λ(l : ∀(x : *) → (a → x → x) → x → x) → l

The first line is the type, which is a desugared form of:

forall a . List a -> List a

The second line is the program, which is the identity function on lists. Morte optimized away the map completely, the same way we did by hand.

Morte also optimized away the rest of the code, too. Dead-code elimination is just an emergent property of Morte's simple optimization scheme.

Equality

We could double-check our answer by asking Morte to optimize the identity function on lists:

-- idlist.mt

( \(List : * -> *)
-> \(id : forall (a : *) -> a -> a)

-> \(a : *) -> id (List a)
)

-- List
(\(a : *) -> forall (x : *) -> (a -> x -> x) -> x -> x)

-- id
(\(a : *) -> \(va : a) -> va)

Sure enough, Morte outputs an alpha-equivalent result (meaning the same up to variable renaming):

$ ~/.cabal/bin/morte < idlist.mt
∀(a : *) → (∀(x : *) → (a → x → x) → x → x) → ∀(x : *) → (a
→ x → x) → x → x

λ(a : *) → λ(va : ∀(x : *) → (a → x → x) → x → x) → va

We can even use the morte library to mechanically check if two Morte expressions are alpha-, beta-, and eta- equivalent. We can parse our two Morte files into Morte's Expr type and then use the Eq instance for Expr to test for equivalence:

$ ghci
Prelude> import qualified Data.Text.Lazy.IO as Text
Prelude Text> txt1 <- Text.readFile "mapid.mt"
Prelude Text> txt2 <- Text.readFile "idlist.mt"
Prelude Text> import Morte.Parser (exprFromText)
Prelude Text Morte.Parser> let e1 = exprFromText txt1
Prelude Text Morte.Parser> let e2 = exprFromText txt2
Prelude Text Morte.Parser> import Control.Applicative (liftA2)
Prelude Text Morte.Parser Control.Applicative> liftA2 (==) e1 e2
Right True
$ -- `Right` means both expressions parsed successfully
$ -- `True` means they are alpha-, beta-, and eta-equivalent

We can use this to mechanically verify that two Morte programs optimize to the same result.

Compile-time computation

Morte can compute as much (or as little) at compile as you want. The more information you encode directly within lambda calculus, the more compile-time computation Morte will perform for you. For example, if we translate our Haskell List code entirely to lambda calculus, then Morte will statically compute the result at compile time.

-- nine.mt

( \(Nat : *)
-> \(zero : Nat)
-> \(one : Nat)
-> \((+) : Nat -> Nat -> Nat)
-> \((*) : Nat -> Nat -> Nat)
-> \(List : * -> *)
-> \(Cons : forall (a : *) -> a -> List a -> List a)
-> \(Nil : forall (a : *) -> List a)
-> \( map
: forall (a : *) -> forall (b : *)
-> (a -> b) -> List a -> List b
)
-> \( foldr
: forall (a : *)
-> List a
-> forall (r : *)
-> (a -> r -> r) -> r -> r
)
-> ( \(two : Nat)
-> \(three : Nat)
-> ( \(example : List Nat)

-> foldr Nat (map Nat Nat ((+) one) example) Nat (+) zero
)

-- example
(Cons Nat one (Cons Nat two (Cons Nat three (Nil Nat))))
)

-- two
((+) one one)

-- three
((+) one ((+) one one))
)

-- Nat
( forall (a : *)
-> (a -> a)
-> a
-> a
)

-- zero
( \(a : *)
-> \(Succ : a -> a)
-> \(Zero : a)
-> Zero
)

-- one
( \(a : *)
-> \(Succ : a -> a)
-> \(Zero : a)
-> Succ Zero
)

-- (+)
( \(m : forall (a : *) -> (a -> a) -> a -> a)
-> \(n : forall (a : *) -> (a -> a) -> a -> a)
-> \(a : *)
-> \(Succ : a -> a)
-> \(Zero : a)
-> m a Succ (n a Succ Zero)
)

-- (*)
( \(m : forall (a : *) -> (a -> a) -> a -> a)
-> \(n : forall (a : *) -> (a -> a) -> a -> a)
-> \(a : *)
-> \(Succ : a -> a)
-> \(Zero : a)
-> m a (n a Succ) Zero
)

-- List
( \(a : *)
-> forall (x : *)
-> (a -> x -> x) -- Cons
-> x -- Nil
-> x
)

-- Cons
( \(a : *)
-> \(va : a)
-> \(vas : forall (x : *) -> (a -> x -> x) -> x -> x)
-> \(x : *)
-> \(Cons : a -> x -> x)
-> \(Nil : x)
-> Cons va (vas x Cons Nil)
)

-- Nil
( \(a : *)
-> \(x : *)
-> \(Cons : a -> x -> x)
-> \(Nil : x)
-> Nil
)

-- map
( \(a : *)
-> \(b : *)
-> \(f : a -> b)
-> \(l : forall (x : *) -> (a -> x -> x) -> x -> x)
-> \(x : *)
-> \(Cons : b -> x -> x)
-> \(Nil: x)
-> l x (\(va : a) -> \(vx : x) -> Cons (f va) vx) Nil
)

-- foldr
( \(a : *)
-> \(vas : forall (x : *) -> (a -> x -> x) -> x -> x)
-> vas
)

The relevant line is:

foldr Nat (map Nat Nat ((+) one) example) Nat (+) zero

If you remove the type-applications to Nat, this parallels our original Haskell example. We can then evaluate this expression at compile time:

$ morte < nine.mt
∀(a : *) → (a → a) → a → a

λ(a : *) → λ(Succ : a → a) → λ(Zero : a) → Succ (Succ (Succ
(Succ (Succ (Succ (Succ (Succ (Succ Zero))))))))

Morte reduces our program to a church-encoded nine.

Run-time computation

Morte does not force you to compute everything using lambda calculus at compile time. Suppose that we wanted to use machine arithmetic at run-time instead. We can do this by parametrizing our program on:

  • the Int type,
  • operations on Ints, and
  • any integer literals we use

We accept these "foreign imports" as ordinary arguments to our program:

-- foreign.mt

-- Foreign imports
\(Int : *) -- Foreign type
-> \((+) : Int -> Int -> Int) -- Foreign function
-> \((*) : Int -> Int -> Int) -- Foreign function
-> \(lit@0 : Int) -- Literal "1" -- Foreign data
-> \(lit@1 : Int) -- Literal "2" -- Foreign data
-> \(lit@2 : Int) -- Literal "3" -- Foreign data
-> \(lit@3 : Int) -- Literal "1" -- Foreign data
-> \(lit@4 : Int) -- Literal "0" -- Foreign data

-- The rest is compile-time lambda calculus
-> ( \(List : * -> *)
-> \(Cons : forall (a : *) -> a -> List a -> List a)
-> \(Nil : forall (a : *) -> List a)
-> \( map
: forall (a : *)
-> forall (b : *)
-> (a -> b) -> List a -> List b
)
-> \( foldr
: forall (a : *)
-> List a
-> forall (r : *)
-> (a -> r -> r) -> r -> r
)
-> ( \(example : List Int)

-> foldr Int (map Int Int ((+) lit@3) example) Int (+) lit@4
)

-- example
(Cons Int lit@0 (Cons Int lit@1 (Cons Int lit@2 (Nil Int))))
)

-- List
( \(a : *)
-> forall (x : *)
-> (a -> x -> x) -- Cons
-> x -- Nil
-> x
)

-- Cons
( \(a : *)
-> \(va : a)
-> \(vas : forall (x : *) -> (a -> x -> x) -> x -> x)
-> \(x : *)
-> \(Cons : a -> x -> x)
-> \(Nil : x)
-> Cons va (vas x Cons Nil)
)

-- Nil
( \(a : *)
-> \(x : *)
-> \(Cons : a -> x -> x)
-> \(Nil : x)
-> Nil
)

-- map
( \(a : *)
-> \(b : *)
-> \(f : a -> b)
-> \(l : forall (x : *) -> (a -> x -> x) -> x -> x)
-> \(x : *)
-> \(Cons : b -> x -> x)
-> \(Nil: x)
-> l x (\(va : a) -> \(vx : x) -> Cons (f va) vx) Nil
)

-- foldr
( \(a : *)
-> \(vas : forall (x : *) -> (a -> x -> x) -> x -> x)
-> vas
)

We can use Morte to optimize the above program and Morte will reduce the program to nothing but foreign types, operations, and values:

$ morte < foreign.mt
∀(Int : *) → (Int → Int → Int) → (Int → Int → Int) → Int →
Int → Int → Int → Int → Int

λ(Int : *) → λ((+) : Int → Int → Int) → λ((*) : Int → Int →
Int) → λ(lit : Int) → λ(lit@1 : Int) → λ(lit@2 : Int) →
λ(lit@3 : Int) → λ(lit@4 : Int) → (+) ((+) lit@3 lit) ((+)
((+) lit@3 lit@1) ((+) ((+) lit@3 lit@2) lit@4))

If you study that closely, Morte adds lit@3 (the "1" literal) to each literal of the list and then adds them up. We can then pass this foreign syntax tree to our machine arithmetic backend to transform those foreign operations to efficient operations.

Morte lets you choose how much information you want to encode within lambda calculus. The more information you encode in lambda calculus the more Morte can optimize your program, but the slower your compile times will get, so it's a tradeoff.

Corecursion

Corecursion is the dual of recursion. Where recursion works on finite data types, corecursion works on potentially infinite data types. An example would be the following infinite Stream in Haskell:

data Stream a = Cons a (Stream a)

numbers :: Stream Int
numbers = go 0
where
go n = Cons n (go (n + 1))

-- numbers = Cons 0 (Cons 1 (Cons 2 (...

map :: (a -> b) -> Stream a -> Stream b
map f (Cons a l) = Cons (f a) (map f l)

example :: Stream Int
example = map (+ 1) numbers

-- example = Cons 1 (Cons 2 (Cons 3 (...

Again, pretend that we disable any function from referencing itself so that the above code becomes invalid. This time we cannot reuse the same trick from previous sections because we cannot encode numbers as a fold without referencing itself. Try this if you don't believe me.

However, we can still encode corecursive things in a non-corecursive way. This time, we encode our Stream type as an unfold instead of a fold:

-- This is also valid Haskell code

{-# LANGUAGE ExistentialQuantification #-}

data Stream a = forall s . MkStream
{ seed :: s
, step :: s -> (a, s)
}

numbers :: Stream Int
numbers = MkStream 0 (\n -> (n, n + 1))

map :: (a -> b) -> Stream a -> Stream b
map f (MkStream s0 k) = MkStream s0 k'
where
k' s = (f a, s')
where (a, s') = k s

In other words, we store an initial seed of some type s and a step function of type s -> (a, s) that emits one element of our Stream. The type of our seed s can be anything and in our numbers example, the type of the internal state is Int. Another stream could use a completely different internal state of type (), like this:

-- ones = Cons 1 ones

ones :: Stream Int
ones = MkStream () (\_ -> (1, ()))

The general name for this trick is an "F-coalgebra" encoding of a corecursive type.

Once we encode our infinite stream non-recursively, we can safely optimize the stream by inlining and eta reduction:

map id l

-- l = MkStream s0 k
= map id (MkStream s0 k)

-- Inline definition of `map`
= MkStream s0 k'
where
k' = \s -> (id a, s')
where
(a, s') = k s

-- Inline definition of `id`
= MkStream s0 k'
where
k' = \s -> (a, s')
where
(a, s') = k s

-- Inline: (a, s') = k s
= MkStream s0 k'
where
k' = \s -> k s

-- Eta reduce
= MkStream s0 k'
where
k' = k

-- Inline: k' = k
= MkStream s0 k

-- l = MkStream s0 k
= l

Now let's encode Stream and map in Morte and compile the following four expressions:

map id

id

map f . map g

map (f . g)

Save the following Morte file to stream.mt and then uncomment the expression you want to test:

( \(id : forall (a : *) -> a -> a)
-> \( (.)
: forall (a : *)
-> forall (b : *)
-> forall (c : *)
-> (b -> c)
-> (a -> b)
-> (a -> c)
)
-> \(Pair : * -> * -> *)
-> \(P : forall (a : *) -> forall (b : *) -> a -> b -> Pair a b)
-> \( first
: forall (a : *)
-> forall (b : *)
-> forall (c : *)
-> (a -> b)
-> Pair a c
-> Pair b c
)

-> ( \(Stream : * -> *)
-> \( map
: forall (a : *)
-> forall (b : *)
-> (a -> b)
-> Stream a
-> Stream b
)

-- example@1 = example@2
-> ( \(example@1 : forall (a : *) -> Stream a -> Stream a)
-> \(example@2 : forall (a : *) -> Stream a -> Stream a)

-- example@3 = example@4
-> \( example@3
: forall (a : *)
-> forall (b : *)
-> forall (c : *)
-> (b -> c)
-> (a -> b)
-> Stream a
-> Stream c
)

-> \( example@4
: forall (a : *)
-> forall (b : *)
-> forall (c : *)
-> (b -> c)
-> (a -> b)
-> Stream a
-> Stream c
)

-- Uncomment the example you want to test
-> example@1
-- -> example@2
-- -> example@3
-- -> example@4
)

-- example@1
(\(a : *) -> map a a (id a))

-- example@2
(\(a : *) -> id (Stream a))

-- example@3
( \(a : *)
-> \(b : *)
-> \(c : *)
-> \(f : b -> c)
-> \(g : a -> b)
-> map a c ((.) a b c f g)
)

-- example@4
( \(a : *)
-> \(b : *)
-> \(c : *)
-> \(f : b -> c)
-> \(g : a -> b)
-> (.) (Stream a) (Stream b) (Stream c) (map b c f) (map a b g)
)
)

-- Stream
( \(a : *)
-> forall (x : *)
-> (forall (s : *) -> s -> (s -> Pair a s) -> x)
-> x
)

-- map
( \(a : *)
-> \(b : *)
-> \(f : a -> b)
-> \( st
: forall (x : *)
-> (forall (s : *) -> s -> (s -> Pair a s) -> x)
-> x
)
-> \(x : *)
-> \(S : forall (s : *) -> s -> (s -> Pair b s) -> x)
-> st
x
( \(s : *)
-> \(seed : s)
-> \(step : s -> Pair a s)
-> S
s
seed
(\(seed@1 : s) -> first a b s f (step seed@1))
)
)
)

-- id
(\(a : *) -> \(va : a) -> va)

-- (.)
( \(a : *)
-> \(b : *)
-> \(c : *)
-> \(f : b -> c)
-> \(g : a -> b)
-> \(va : a)
-> f (g va)
)

-- Pair
(\(a : *) -> \(b : *) -> forall (x : *) -> (a -> b -> x) -> x)

-- P
( \(a : *)
-> \(b : *)
-> \(va : a)
-> \(vb : b)
-> \(x : *)
-> \(P : a -> b -> x)
-> P va vb
)

-- first
( \(a : *)
-> \(b : *)
-> \(c : *)
-> \(f : a -> b)
-> \(p : forall (x : *) -> (a -> c -> x) -> x)
-> \(x : *)
-> \(Pair : b -> c -> x)
-> p x (\(va : a) -> \(vc : c) -> Pair (f va) vc)
)

Both example@1 and example@2 will generate alpha-equivalent code:

$ morte < example1.mt
∀(a : *) → (∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (a →
s → x) → x) → x) → x) → ∀(x : *) → (∀(s : *) → s → (s → ∀(x
: *) → (a → s → x) → x) → x) → x

λ(a : *) → λ(st : ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) →
(a → s → x) → x) → x) → x) → st

$ morte < example2.mt
∀(a : *) → (∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (a →
s → x) → x) → x) → x) → ∀(x : *) → (∀(s : *) → s → (s → ∀(x
: *) → (a → s → x) → x) → x) → x

λ(a : *) → λ(va : ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) →
(a → s → x) → x) → x) → x) → va

Similarly, example@3 and example@4 will generate alpha-equivalent code:

$ morte < example3.mt
∀(a : *) → ∀(b : *) → ∀(c : *) → (b → c) → (a → b) → (∀(x :
*) → (∀(s : *) → s → (s → ∀(x : *) → (a → s → x) → x) → x) →
x) → ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (c → s → x)
→ x) → x) → x

λ(a : *) → λ(b : *) → λ(c : *) → λ(f : b → c) → λ(g : a → b)
→ λ(st : ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (a → s
→ x) → x) → x) → x) → λ(x : *) → λ(S : ∀(s : *) → s → (s → ∀
(x : *) → (c → s → x) → x) → x) → st x (λ(s : *) → λ(seed :
s) → λ(step : s → ∀(x : *) → (a → s → x) → x) → S s seed (λ(
seed@1 : s) → λ(x : *) → λ(Pair : c → s → x) → step seed@1 x
(λ(va : a) → Pair (f (g va)))))

$ morte < example4.mt
∀(a : *) → ∀(b : *) → ∀(c : *) → (b → c) → (a → b) → (∀(x :
*) → (∀(s : *) → s → (s → ∀(x : *) → (a → s → x) → x) → x) →
x) → ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (c → s → x)
→ x) → x) → x

λ(a : *) → λ(b : *) → λ(c : *) → λ(f : b → c) → λ(g : a → b)
→ λ(va : ∀(x : *) → (∀(s : *) → s → (s → ∀(x : *) → (a → s
→ x) → x) → x) → x) → λ(x : *) → λ(S : ∀(s : *) → s → (s → ∀
(x : *) → (c → s → x) → x) → x) → va x (λ(s : *) → λ(seed :
s) → λ(step : s → ∀(x : *) → (a → s → x) → x) → S s seed (λ(
seed@1 : s) → λ(x : *) → λ(Pair : c → s → x) → step seed@1 x
(λ(va : a) → Pair (f (g va))))

We inadvertently proved stream fusion for free, but we're still not done, yet! Everything we learn about recursive and corecursive sequences can be applied to model recursive and corecursive effects!

Effects

I will conclude this post by showing how to model both recursive and corecursive programs that have side effects. The recursive program will echo ninety-nine lines from stdin to stdout. The equivalent Haskell program is in the comment header:

-- recursive.mt

-- The Haskell code we will translate to Morte:
--
-- import Prelude hiding (
-- (+), (*), IO, putStrLn, getLine, (>>=), (>>), return )
--
-- -- Simple prelude
--
-- data Nat = Succ Nat | Zero
--
-- zero :: Nat
-- zero = Zero
--
-- one :: Nat
-- one = Succ Zero
--
-- (+) :: Nat -> Nat -> Nat
-- Zero + n = n
-- Succ m + n = m + Succ n
--
-- (*) :: Nat -> Nat -> Nat
-- Zero * n = Zero
-- Succ m * n = n + (m * n)
--
-- foldNat :: Nat -> (a -> a) -> a -> a
-- foldNat Zero f x = x
-- foldNat (Succ m) f x = f (foldNat m f x)
--
-- data IO r
-- = PutStrLn String (IO r)
-- | GetLine (String -> IO r)
-- | Return r
--
-- putStrLn :: String -> IO U
-- putStrLn str = PutStrLn str (Return Unit)
--
-- getLine :: IO String
-- getLine = GetLine Return
--
-- return :: a -> IO a
-- return = Return
--
-- (>>=) :: IO a -> (a -> IO b) -> IO b
-- PutStrLn str io >>= f = PutStrLn str (io >>= f)
-- GetLine k >>= f = GetLine (\str -> k str >>= f)
-- Return r >>= f = f r
--
-- -- Derived functions
--
-- (>>) :: IO U -> IO U -> IO U
-- m >> n = m >>= \_ -> n
--
-- two :: Nat
-- two = one + one
--
-- three :: Nat
-- three = one + one + one
--
-- four :: Nat
-- four = one + one + one + one
--
-- five :: Nat
-- five = one + one + one + one + one
--
-- six :: Nat
-- six = one + one + one + one + one + one
--
-- seven :: Nat
-- seven = one + one + one + one + one + one + one
--
-- eight :: Nat
-- eight = one + one + one + one + one + one + one + one
--
-- nine :: Nat
-- nine = one + one + one + one + one + one + one + one + one
--
-- ten :: Nat
-- ten = one + one + one + one + one + one + one + one + one + one
--
-- replicateM_ :: Nat -> IO U -> IO U
-- replicateM_ n io = foldNat n (io >>) (return Unit)
--
-- ninetynine :: Nat
-- ninetynine = nine * ten + nine
--
-- main_ :: IO U
-- main_ = replicateM_ ninetynine (getLine >>= putStrLn)

-- "Free" variables
( \(String : * )
-> \(U : *)
-> \(Unit : U)

-- Simple prelude
-> ( \(Nat : *)
-> \(zero : Nat)
-> \(one : Nat)
-> \((+) : Nat -> Nat -> Nat)
-> \((*) : Nat -> Nat -> Nat)
-> \(foldNat : Nat -> forall (a : *) -> (a -> a) -> a -> a)
-> \(IO : * -> *)
-> \(return : forall (a : *) -> a -> IO a)
-> \((>>=)
: forall (a : *)
-> forall (b : *)
-> IO a
-> (a -> IO b)
-> IO b
)
-> \(putStrLn : String -> IO U)
-> \(getLine : IO String)

-- Derived functions
-> ( \((>>) : IO U -> IO U -> IO U)
-> \(two : Nat)
-> \(three : Nat)
-> \(four : Nat)
-> \(five : Nat)
-> \(six : Nat)
-> \(seven : Nat)
-> \(eight : Nat)
-> \(nine : Nat)
-> \(ten : Nat)
-> ( \(replicateM_ : Nat -> IO U -> IO U)
-> \(ninetynine : Nat)

-> replicateM_ ninetynine ((>>=) String U getLine putStrLn)
)

-- replicateM_
( \(n : Nat)
-> \(io : IO U)
-> foldNat n (IO U) ((>>) io) (return U Unit)
)

-- ninetynine
((+) ((*) nine ten) nine)
)

-- (>>)
( \(m : IO U)
-> \(n : IO U)
-> (>>=) U U m (\(_ : U) -> n)
)

-- two
((+) one one)

-- three
((+) one ((+) one one))

-- four
((+) one ((+) one ((+) one one)))

-- five
((+) one ((+) one ((+) one ((+) one one))))

-- six
((+) one ((+) one ((+) one ((+) one ((+) one one)))))

-- seven
((+) one ((+) one ((+) one ((+) one ((+) one ((+) one one))))))

-- eight
((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one one)))))))
-- nine
((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one one))))))))

-- ten
((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one ((+) one one)))))))))
)

-- Nat
( forall (a : *)
-> (a -> a)
-> a
-> a
)

-- zero
( \(a : *)
-> \(Succ : a -> a)
-> \(Zero : a)
-> Zero
)

-- one
( \(a : *)
-> \(Succ : a -> a)
-> \(Zero : a)
-> Succ Zero
)

-- (+)
( \(m : forall (a : *) -> (a -> a) -> a -> a)
-> \(n : forall (a : *) -> (a -> a) -> a -> a)
-> \(a : *)
-> \(Succ : a -> a)
-> \(Zero : a)
-> m a Succ (n a Succ Zero)
)

-- (*)
( \(m : forall (a : *) -> (a -> a) -> a -> a)
-> \(n : forall (a : *) -> (a -> a) -> a -> a)
-> \(a : *)
-> \(Succ : a -> a)
-> \(Zero : a)
-> m a (n a Succ) Zero
)

-- foldNat
( \(n : forall (a : *) -> (a -> a) -> a -> a)
-> n
)

-- IO
( \(r : *)
-> forall (x : *)
-> (String -> x -> x)
-> ((String -> x) -> x)
-> (r -> x)
-> x
)

-- return
( \(a : *)
-> \(va : a)
-> \(x : *)
-> \(PutStrLn : String -> x -> x)
-> \(GetLine : (String -> x) -> x)
-> \(Return : a -> x)
-> Return va
)

-- (>>=)
( \(a : *)
-> \(b : *)
-> \(m : forall (x : *)
-> (String -> x -> x)
-> ((String -> x) -> x)
-> (a -> x)
-> x
)
-> \(f : a
-> forall (x : *)
-> (String -> x -> x)
-> ((String -> x) -> x)
-> (b -> x)
-> x
)
-> \(x : *)
-> \(PutStrLn : String -> x -> x)
-> \(GetLine : (String -> x) -> x)
-> \(Return : b -> x)
-> m x PutStrLn GetLine (\(va : a) -> f va x PutStrLn GetLine Return)
)

-- putStrLn
( \(str : String)
-> \(x : *)
-> \(PutStrLn : String -> x -> x )
-> \(GetLine : (String -> x) -> x)
-> \(Return : U -> x)
-> PutStrLn str (Return Unit)
)

-- getLine
( \(x : *)
-> \(PutStrLn : String -> x -> x )
-> \(GetLine : (String -> x) -> x)
-> \(Return : String -> x)
-> GetLine Return
)
)

This program will compile to a completely unrolled read-write loop, as most recursive programs will:

$ morte < recursive.mt
∀(String : *) → ∀(U : *) → U → ∀(x : *) → (String → x → x) →
((String → x) → x) → (U → x) → x

λ(String : *) → λ(U : *) → λ(Unit : U) → λ(x : *) → λ(PutStr
Ln : String → x → x) → λ(GetLine : (String → x) → x) → λ(Ret
urn : U → x) → GetLine (λ(va : String) → PutStrLn va (GetLin
e (λ(va@1 : String) → PutStrLn va@1 (GetLine (λ(va@2 : Strin
g) → PutStrLn va@2 (GetLine (λ(va@3 : String) → PutStrLn ...
<snip>
... GetLine (λ(va@92 : String) → PutStrLn va@92 (GetLine (λ(
va@93 : String) → PutStrLn va@93 (GetLine (λ(va@94 : String)
→ PutStrLn va@94 (GetLine (λ(va@95 : String) → PutStrLn va@
95 (GetLine (λ(va@96 : String) → PutStrLn va@96 (GetLine (λ(
va@97 : String) → PutStrLn va@97 (GetLine (λ(va@98 : String)
→ PutStrLn va@98 (Return Unit))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))))))))))

In contrast, if we encode the effects corecursively we can express a program that echoes indefinitely from stdin to stdout:

-- corecursive.mt

-- data IOF r s
-- = PutStrLn String s
-- | GetLine (String -> s)
-- | Return r
--
-- data IO r = forall s . MkIO s (s -> IOF r s)
--
-- main = MkIO
-- Nothing
-- (maybe (\str -> PutStrLn str Nothing) (GetLine Just))

( \(String : *)
-> ( \(Maybe : * -> *)
-> \(Just : forall (a : *) -> a -> Maybe a)
-> \(Nothing : forall (a : *) -> Maybe a)
-> \( maybe
: forall (a : *)
-> Maybe a
-> forall (x : *)
-> (a -> x)
-> x
-> x
)
-> \(IOF : * -> * -> *)
-> \( PutStrLn
: forall (r : *)
-> forall (s : *)
-> String
-> s
-> IOF r s
)
-> \( GetLine
: forall (r : *)
-> forall (s : *)
-> (String -> s)
-> IOF r s
)
-> \( Return
: forall (r : *)
-> forall (s : *)
-> r
-> IOF r s
)
-> ( \(IO : * -> *)
-> \( MkIO
: forall (r : *)
-> forall (s : *)
-> s
-> (s -> IOF r s)
-> IO r
)
-> ( \(main : forall (r : *) -> IO r)
-> main
)

-- main
( \(r : *)
-> MkIO
r
(Maybe String)
(Nothing String)
( \(m : Maybe String)
-> maybe
String
m
(IOF r (Maybe String))
(\(str : String) ->
PutStrLn
r
(Maybe String)
str
(Nothing String)
)
(GetLine r (Maybe String) (Just String))
)
)
)

-- IO
( \(r : *)
-> forall (x : *)
-> (forall (s : *) -> s -> (s -> IOF r s) -> x)
-> x
)

-- MkIO
( \(r : *)
-> \(s : *)
-> \(seed : s)
-> \(step : s -> IOF r s)
-> \(x : *)
-> \(k : forall (s : *) -> s -> (s -> IOF r s) -> x)
-> k s seed step
)
)

-- Maybe
(\(a : *) -> forall (x : *) -> (a -> x) -> x -> x)

-- Just
( \(a : *)
-> \(va : a)
-> \(x : *)
-> \(Just : a -> x)
-> \(Nothing : x)
-> Just va
)

-- Nothing
( \(a : *)
-> \(x : *)
-> \(Just : a -> x)
-> \(Nothing : x)
-> Nothing
)

-- maybe
( \(a : *)
-> \(m : forall (x : *) -> (a -> x) -> x-> x)
-> m
)

-- IOF
( \(r : *)
-> \(s : *)
-> forall (x : *)
-> (String -> s -> x)
-> ((String -> s) -> x)
-> (r -> x)
-> x
)

-- PutStrLn
( \(r : *)
-> \(s : *)
-> \(str : String)
-> \(vs : s)
-> \(x : *)
-> \(PutStrLn : String -> s -> x)
-> \(GetLine : (String -> s) -> x)
-> \(Return : r -> x)
-> PutStrLn str vs
)

-- GetLine
( \(r : *)
-> \(s : *)
-> \(k : String -> s)
-> \(x : *)
-> \(PutStrLn : String -> s -> x)
-> \(GetLine : (String -> s) -> x)
-> \(Return : r -> x)
-> GetLine k
)

-- Return
( \(r : *)
-> \(s : *)
-> \(vr : r)
-> \(x : *)
-> \(PutStrLn : String -> s -> x)
-> \(GetLine : (String -> s) -> x)
-> \(Return : r -> x)
-> Return vr
)

)

This compiles to a state machine that we can unfold one step at a time:

$ morte < corecursive.mt
∀(String : *) → ∀(r : *) → ∀(x : *) → (∀(s : *) → s → (s → ∀
(x : *) → (String → s → x) → ((String → s) → x) → (r → x) →
x) → x) → x

λ(String : *) → λ(r : *) → λ(x : *) → λ(k : ∀(s : *) → s → (
s → ∀(x : *) → (String → s → x) → ((String → s) → x) → (r →
x) → x) → x) → k (∀(x : *) → (String → x) → x → x) (λ(x : *)
→ λ(Just : String → x) → λ(Nothing : x) → Nothing) (λ(m : ∀
(x : *) → (String → x) → x → x) → m (∀(x : *) → (String → (∀
(x : *) → (String → x) → x → x) → x) → ((String → ∀(x : *) →
(String → x) → x → x) → x) → (r → x) → x) (λ(str : String)
→ λ(x : *) → λ(PutStrLn : String → (∀(x : *) → (String → x)
→ x → x) → x) → λ(GetLine : (String → ∀(x : *) → (String → x
) → x → x) → x) → λ(Return : r → x) → PutStrLn str (λ(x : *)
→ λ(Just : String → x) → λ(Nothing : x) → Nothing)) (λ(x :
*) → λ(PutStrLn : String → (∀(x : *) → (String → x) → x → x)
→ x) → λ(GetLine : (String → ∀(x : *) → (String → x) → x →
x) → x) → λ(Return : r → x) → GetLine (λ(va : String) → λ(x
: *) → λ(Just : String → x) → λ(Nothing : x) → Just va))

I don't expect you to understand that output other than to know that we can translate the output to any backend that provides functions, and primitive read/write operations.

Conclusion

If you would like to use Morte, you can find the library on both Github and Hackage. I also provide a Morte tutorial that you can use to learn more about the library.

Morte is dependently typed in theory, but in practice I have not exercised this feature so I don't understand the implications of this. If this turns out to be a mistake then I will downgrade Morte to System Fw, which has higher-kinds and polymorphism, but no dependent types.

Additionally, Morte might be usable to transmit code in a secure and typed way in distributed environment or to share code between diverse functional language by providing a common intermediate language. However, both of those scenarios require additional work, such as establishing a shared set of foreign primitives and creating Morte encoders/decoders for each target language.

Also, there are additional optimizations which Morte might implement in the future. For example, Morte could use free theorems (equalities you deduce from the types) to simplify some code fragments even further, but Morte currently does not do this.

My next goals are:

  • Add a back-end to compile Morte to LLVM
  • Add a front-end to desugar a medium-level Haskell-like language to Morte

Once those steps are complete then Morte will be a usable intermediate language for writing super-optimizable programs.

Also, if you're wondering, the name Morte is a tribute to a talking skull from the game Planescape: Torment, since the Morte library is a "bare-bones" calculus of constructions.

Literature

If this topic interests you more, you may find the following links helpful, in roughly increasing order of difficulty:

Categories: Offsite Blogs

The Notes of GHC

Haskell on Reddit - Tue, 05/19/2015 - 7:25am
Categories: Incoming News

FP Complete: PSA: GHC 7.10, cabal, and Windows

Planet Haskell - Tue, 05/19/2015 - 2:20am

Since we've received multiple bug reports on this, and there are many people suffering from it reporting on the cabal issue, Neil and I decided a more public announcement was warranted.

There is an as-yet undiagnosed bug in cabal which causes some packages to fail to install. Packages known to be affected are blaze-builder-enumerator, data-default-instances-old-locale, vector-binary-instances, and data-default-instances-containers. The output looks something like:

Resolving dependencies... Configuring data-default-instances-old-locale-0.0.1... Building data-default-instances-old-locale-0.0.1... Failed to install data-default-instances-old-locale-0.0.1 Build log ( C:\Users\gl67\AppData\Roaming\cabal\logs\data-default-instances-old-locale-0.0.1.log ): Building data-default-instances-old-locale-0.0.1... Preprocessing library data-default-instances-old-locale-0.0.1... [1 of 1] Compiling Data.Default.Instances.OldLocale ( Data\Default\Instances\OldLocale.hs, dist\build\Data\Default\Instances\OldLocale.o ) C:\Users\gl67\repos\MinGHC\7.10.1\ghc-7.10.1\mingw\bin\ar.exe: dist\build\libHSdata-default-instances-old-locale-0.0.1-6jcjjaR25tK4x3nJhHHjFM.a-11336\libHSdata-default-instances-old-locale-0.0.1-6jcjjaR25tK4x3nJhHHjFM.a: No such file or directory cabal.exe: Error: some packages failed to install: data-default-instances-old-locale-0.0.1 failed during the building phase. The exception was: ExitFailure 1

There are two workarounds I know of at this time:

  • You can manually unpack and install the package which seems to work, e.g.:

    cabal unpack data-default-instances-old-locale-0.0.1 cabal install .\data-default-instances-old-locale-0.0.1
  • Drop down to GHC 7.8.4 until the cabal bug is fixed

For normal users, you can stop reading here. If you're interested in more details and may be able to help fix it, here's a summary of the research I've done so far:

As far as I can tell, this is a bug in cabal-install, not the Cabal library. Despite reports to the contrary, it does not seem to be that the parallelization level (-j option) has any impact. The only thing that seems to affect the behavior is whether cabal-install unpacks and installs in one step, or does it in two steps. That's why unpacking and then installing works around the bug.

I've stared at cabal logs on this quite a bit, but don't see a rhyme or reason to what's happening here. The bug is easily reproducible, so hopefully someone with more cabal expertise will be able to look at this soon, as this bug has high severity and has been affecting Windows users for almost two months.

Categories: Offsite Blogs

Is it efficient to use foldMap for the List monoid?

Haskell on Reddit - Tue, 05/19/2015 - 2:15am

Let t be an instance of Foldable and suppose we are given a value of type t a for some a. Suppose also we intend to map every element of type a in this t a to a list and concatenate the results.

Since List is a Monoid (with signature ((++), [])), a very clean approach to this is to simply call foldMap. However, since foldMap exploits (or may exploit) the associativity of the monoidal operation, we could end up with an inefficient sequencing of concatenations, unlike the use of foldr, which would allow us to keep the accumulating parameter on the right.

Does GHC apply some optimization (such as a rewrite rule) to rend this matter moot, or should the somewhat more cumbersome usage of foldr be preferred?

submitted by n-simplex
[link] [12 comments]
Categories: Incoming News

Automatically generate sum type members -> [String]

haskell-cafe - Tue, 05/19/2015 - 12:20am
Can I use either template Haskell, GHC Generics, or something else to get the list of Constructors in Codes? Then I could do `map show codeSumTypeMembers` and get `["A0100A","A0500A"]`? Here's an example of solving this problem manually: data Codes = A0100A | A0500A deriving Show codeExists "A0100A" = True codeExists "A0500A" = True codeExists _ = False main = print $ codeExists xmlElementName where xmlElementName = "A0500A"
Categories: Offsite Discussion

Abandoning String = [Char]?

haskell-cafe - Mon, 05/18/2015 - 11:44pm
Hey all, In the earlier haskell-cafe discussion of IsString, someone mentioned that it would be nice to abandon [Char] as the blessed string type in Haskell. I've thought about this on and off for a while now, and think that the fact that [Char] is the default string type is a really big issue (for example, it gives beginners the idea that Haskell is incredibly slow, because everything that involves string processing is using linked lists). I am not proposing anything, but am curious as to what already has been discussed: 1. Has the possibility of migrating away from [Char] been investigated before? 2. What gains could we see in ease of use, performance, etc, if [Char] was deprecated? 3. What could replace [Char], while retaining the same ease of use for writing string manipulation functions (pattern matching, etc)? 4. Is there any sort of migration path that would make this change feasible in mainline Haskell in the medium term (2-5 years)? Thanks! I would welcome any references or links that my cursory
Categories: Offsite Discussion

Jasper Van der Jeugt: Can we write a Monoidal Either?

Planet Haskell - Mon, 05/18/2015 - 6:00pm
> {-# LANGUAGE GeneralizedNewtypeDeriving #-} > import Control.Applicative (Applicative (..)) > import Data.Monoid (Monoid, (<>)) > newtype Expr = Expr String > instance Show Expr where show (Expr x) = x > type Type = String > type TypeScope = [Type] > type Scope = String > type Program = String > expr1, expr2 :: Expr > expr1 = Expr "False == 3" > expr2 = Expr "[1, 'a']" > program1 :: Program > program1 = undefined Introduction

For the last month or, I have been working as a contractor for Luminal. I am helping them implement Fugue, and more specifically Ludwig – a compiler a for statically typed declarative configuration language. This is one of the most interesting projects I have worked on so far – writing a compiler is really fun. While implementing some parts of this compiler, I came across an interesting problem.

In particular, a typeclass instance seemed to adhere to both the Monad and Applicative laws, but with differing behaviour – which felt a bit fishy. I started a discussion on Twitter understand it better, and these are my thoughts on the matter.

The problem

Suppose we’re writing a typechecker. We have to do a number of things:

  • Parse the program to get a list of user-defined types.
  • Typecheck each expression.
  • A bunch of other things, but for the purpose of this post we’re going to leave it at that.

Now, any of these steps could fail, and we’d like to log the reason for failure. Clearly this is a case for something like Either! Let’s define ourselves an appropriate datatype.

> data Check e a > = Failed e > | Ok a > deriving (Eq, Show)

We can write a straightforward Functor instance:

> instance Functor (Check e) where > fmap _ (Failed e) = Failed e > fmap f (Ok x) = Ok (f x)

The Monad instance is also very obvious:

> instance Monad (Check e) where > return x = Ok x > > Failed e >>= _ = Failed e > Ok x >>= f = f x

However, the Applicative instance is not that obvious – we seem to have a choice.

But first, lets take a step back and stub out our compiler a bit more, so that we have some more context. Imagine we have the following types in our compiler:

data Type = ... data Expr = ... data Program = ... type TypeScope = [Type]

And our code looks like this:

> findDefinedTypes1 :: Program -> Check String TypeScope > findDefinedTypes1 _ = Ok [] -- Assume we can't define types for now. > typeCheck1 :: TypeScope -> Expr -> Check String Type > typeCheck1 _ e = Failed $ "Could not typecheck: " ++ show e > compiler1 :: Check String () > compiler1 = do > scope <- findDefinedTypes1 program1 > typeCheck1 scope expr1 -- False == 3 > typeCheck1 scope expr2 -- [1, 'a'] > return ()

On executing compiler1, we get the following error:

*Main> compiler1 Failed "Could not typecheck: False == 3"

Which is correct, but using a compiler entirely written in this fashion would be annoying. Check, like Either, short-circuits on the first error it encounters. This means we would compile our program, fix one error, compile, fix the next error, compile, and so on.

It would be much nicer if users were to see multiple error messages at once.

Of course, this is not always possible. On one hand, if findDefinedTypes1 throws an error, we cannot possibly call typeCheck1, since we do not have a TypeScope.

On the other hand, if findDefinedTypes1 succeeds, shouldn’t it be possible to collect error messages from both typeCheck1 scope expr1 and typeCheck1 scope expr2?

It turns out this is possible, precisely because the second call to typeCheck1 does not depend on the result of the first call – so we can execute them in parallel, if you will. And that is precisely the difference in expressive power between Monad and Applicative: Monadic >>= provides access to previously computed results, where Applicative <*> does not. Let’s (ab?)use this to our advantage.

The solution?

Cleverly, we put together following instance:

> instance Monoid e => Applicative (Check e) where > pure x = Ok x > > Ok f <*> Ok x = Ok (f x) > Ok _ <*> Failed e = Failed e > Failed e <*> Ok _ = Failed e > Failed e1 <*> Failed e2 = Failed (e1 <> e2)

Using this instance we can effectively collect error messages. We need to change our code a bit to support a collection of error messages, so let’s use [String] instead of String since a list is a Monoid.

> findDefinedTypes2 :: Program -> Check [String] TypeScope > findDefinedTypes2 _ = Ok [] -- Assume we can't define types for now. > typeCheck2 :: TypeScope -> Expr -> Check [String] Type > typeCheck2 _ e = Failed ["Could not typecheck: " ++ show e] > compiler2 :: Check [String] () > compiler2 = do > scope <- findDefinedTypes2 program1 > typeCheck2 scope expr1 *> typeCheck2 scope expr2 > return ()

Note that *> is the Applicative equivalent of the Monadic >>.

Now, every error is represented by a list of error messages (typically a singleton such as in typeCheck2), and the Applicative <*> combines error messages. If we execute compiler2, we get:

*Main> compiler2 Failed ["Could not typecheck: False == 3", "Could not typecheck: [1, 'a']"]

Success! But is that all there is to it?

The problem with the solution

The problem is that we have created a situation where <*> is not equal to ap 1. After researching this for a while, it seems that <*> = ap is not a verbatim rule. However, most arguments suggest it should be the case – even the name.

This is important for refactoring, for example. Quite a few Haskell programmers (including myself) would refactor:

do b <- bar q <- qux return (Foo b q)

Into:

Foo <$> bar <*> qux

Without putting too much thought in it, just assuming it does the same thing.

In our case, they are clearly similar, but not equal – we would get only one error instead of collecting error messages. One could argue that this is close enough, but when one uses that argument too frequently, you might just end up with something like PHP.

The problem becomes more clear in the following fragment:

checkForCyclicImports modules >> compileAll modules

Which has completely different behaviour from this fragment:

checkForCyclicImports modules *> compileAll modules

The latter will get stuck in some sort of infinite recursion, while the former will not. This is not a subtle difference anymore. While the problem is easy to spot here (>> vs. *>), this is not always the case:

forEveryImport_ :: Monad m => Module -> (Import -> m ()) -> m ()

Ever since AMP, it is impossible to tell whether this will do a forM_ or a for_-like traversal without looking at the implementation – this makes making mistakes easy.

The solution to the problem with the solution

As we discussed in the previous section, it should be possible for a programmer to tell exactly how a Monad or Applicative will behave, without having to dig into implementations. Having a structure where <*> and ap behave slightly differently makes this hard.

When a Haskell programmer wants to make a clear distinction between two similar types, the first thing that comes to mind is probably newtypes. This problem is no different.

Let’s introduce a newtype for error-collecting Applicative. Since the Functor instance is exactly the same, we might as well generate it using GeneralizedNewtypeDeriving.

> newtype MonoidCheck e a = MonoidCheck {unMonoidCheck :: Check e a} > deriving (Functor, Show)

Now, we provide our Applicative instance for MonoidCheck:

> instance Monoid e => Applicative (MonoidCheck e) where > pure x = MonoidCheck (Ok x) > > MonoidCheck l <*> MonoidCheck r = MonoidCheck $ case (l, r) of > (Ok f , Ok x ) -> Ok (f x) > (Ok _ , Failed e ) -> Failed e > (Failed e , Ok _ ) -> Failed e > (Failed e1, Failed e2) -> Failed (e1 <> e2)

Finally, we avoid writing a Monad instance for MonoidCheck. This approach makes the code cleaner:

  • This ensures that when people use MonoidCheck, they are forced to use the Applicative combinators, and they cannot accidentally reduce the number of error messages.

  • For other programmers reading the code, it is very clear whether we are dealing with short-circuiting behaviour or that we are collecting multiple error messages: it is explicit in the types.

Usage and conversion

Our fragment now becomes:

> findDefinedTypes3 :: Program -> Check [String] TypeScope > findDefinedTypes3 _ = Ok [] -- Assume we can't define types for now. > typeCheck3 :: TypeScope -> Expr -> MonoidCheck [String] Type > typeCheck3 _ e = MonoidCheck $ Failed ["Could not typecheck: " ++ show e] > compiler3 :: Check [String] () > compiler3 = do > scope <- findDefinedTypes3 program1 > unMonoidCheck $ typeCheck3 scope expr1 *> typeCheck3 scope expr2 > return ()

We can see that while it is not more concise, it is definitely more clear: we can see exactly which functions will collect error messages. Furthermore, if we now try to write:

typeCheck3 scope expr1 >> typeCheck3 scope expr2

We will get a compiler warning telling us we should use *> instead.

Explicitly, we now convert between Check and MonoidCheck by simply calling MonoidCheck and unMonoidCheck. We can do this inside other transformers if necessary, using e.g. mapReaderT.

Data.Either.Validation

The MonoidCheck discussed in this blogpost is available as Data.Either.Validation on hackage. The main difference is that instead of using a newtype, the package authors provide a full-blown datatype.

> data Validation e a > = Failure e > | Success a

And two straightforward conversion functions:

> validationToEither :: Validation e a -> Either e a > validationToEither (Failure e) = Left e > validationToEither (Success x) = Right x > eitherToValidation :: Either e a -> Validation e a > eitherToValidation (Left e) = Failure e > eitherToValidation (Right x) = Success x

This makes constructing values a bit easier:

Failure ["Can't go mucking with a 'void*'"]

Instead of:

MonoidCheck $ Failed ["Can't go mucking with a 'void*'"]

At this point, it shouldn’t surprise you that Validation intentionally does not provide a Monad instance.

Conclusion

This, of course, is all my opinion – there doesn’t seem to be any definite consensus on whether or not ap should be the same as <*>, since differing behaviour occurs in prominent libraries. While the Monad and Applicative laws are relatively well known, there is no canonical law saying that ap = <*>.

Update: there actually is a canonical law that ap should be <*>, and it was right under my nose in the Monad documentation since AMP. Before that, it was mentioned in the Applicative documentation. Thanks to quchen for pointing that out to me!

A key point here is that the AMP actually related the two typeclasses. Before that, arguing that the two classes were in a way “unrelated” was still a (dubious) option, but that is no longer the case.

Furthermore, considering this as a law might reveal opportunities for optimisation 2.

Lastly, I am definitely a fan of implementing these differing behaviours using different types and then converting between them: the fact that types explicitly tell me about the behaviour of code is one of the reasons I like Haskell.

Thanks to Alex Sayers for proofreading and suggestions.

  1. ap is the Monadic sibling of <*> (which explains why <*> is commonly pronounced ap). It can be implemented on top of >>=/return:

    > ap :: Monad m => m (a -> b) -> m a -> m b > ap mf mx = do > f <- mf > x <- mx > return (f x)
  2. Take this with a grain of salt – Currently, GHC does not use any of the Monad laws to perform any optimisation. However, some Monad instances use them in RULES pragmas.

Categories: Offsite Blogs

Christopher Done: How Haskellers are seen and see themselves

Planet Haskell - Mon, 05/18/2015 - 6:00pm
How Haskellers are seen

The type system and separated IO is an awkward, restricting space suit:

Spending most of their time gazing longingly at the next abstraction to yoink from mathematics:

Looking at anything outside the Haskell language and the type system:

Using unsafePerformIO:

How Haskellers see themselves

No, it’s not a space suit. It’s Iron Man’s suit!

The suit enables him to do impressive feats with confidence and safety:

Look at the immense freedom and power enabled by wearing the suit:

Reality

Categories: Offsite Blogs

Danny Gratzer: Compiling a Lazy Language in 1,000 words

Planet Haskell - Mon, 05/18/2015 - 6:00pm
Posted on May 19, 2015 Tags: compilers, haskell

I’m a fan of articles like this one which set out to explain a really complicated subject in 600 words or less. I wanted to write one with a similar goal for compiling a language like Haskell. To help with this I’ve broken down what most compilers for a lazy language do into 5 different phases and spent 200 words explaining how they work. This isn’t really intended to be a tutorial on how to implement a compiler, I just want to make it less magical.

I assume that you know how a lazy functional language looks (this isn’t a tutorial on Haskell) and a little about how your machine works since I make a few references to how some lower level details are compiled. These will make more sense if you know such things, but they’re not necessary.

And the word-count-clock starts… now.

Parsing

Our interactions with compilers usually involve treating them as a huge function from string to string. We give them a string (our program) and it gives us back a string (the compiled code). However, on the inside the compiler does all sorts of stuff to that string we gave it and most of those operations are inconvenient to do as string operations. In the first part of the compiler, we convert the string into an abstract syntax tree. This is a data structure in the compiler which represents the string, but in

  1. A more abstract way, it doesn’t have details such as whitespace or comments
  2. A more convenient way, it let’s the compiler perform the operations it wants efficiently

The process of going String -> AST is called “parsing”. It has a lot of (kinda stuffy IMO) theory behind it. This is the only part of the compiler where the syntax actually matters and is usually the smallest part of the compiler.

Examples:

Type Checking

Now that we’ve constructed an abstract syntax tree we want to make sure that the program “makes sense”. Here “make sense” just means that the program’s types are correct. The process for checking that a program type checks involves following a bunch of rules of the form “A has type T if B has type T1 and C has type…”. All of these rules together constitute the type system for our language. As an example, in Haskell f a has the type T2 if f has the type T1 -> T2 and a has the type T1.

There’s a small wrinkle in this story though: most languages require some type inference. This makes things 10x harder because we have to figure the types of everything as we go! Type inference isn’t even possible in a lot of languages and some clever contortions are often needed to be inferrable.

However, once we’ve done all of this the program is correct enough to compile. Past type checking, if the compiler raises an error it’s a compiler bug.

Examples:

Optimizations/Simplifications

Now that we’re free of the constraints of having to report errors to the user things really get fun in the compiler. Now we start simplifying the language by converting a language feature into a mess of other, simpler language features. Sometimes we convert several features into specific instances of one more general feature. For example, we might convert our big fancy pattern language into a simpler one by elaborating each case into a bunch of nested cases.

Each time we remove a feature we end up with a slightly different language. This progression of languages in the compiler are called the “intermediate languages” (ILs). Each of these ILs have their own AST as well! In a good compiler we’ll have a lot of ILs as it makes the compiler much more maintainable.

An important part of choosing an IL is making it amenable to various optimizations. When the compiler is working with each IL it applies a set of optimizations to the program. For example

  1. Constant folding, converting 1 + 1 to 2 during compile time
  2. Inlining, copy-pasting the body of smaller functions where they’re called
  3. Fusion, turning multiple passes over a datastructure into a single one

Examples:

Spineless, Tagless, and Generally Wimpy IL

At some point in the compiler, we have to deal with the fact we’re compiling a lazy language. One nice way is to use a spineless, tagless, graph machine (STG machine).

How an STG machine works is a little complicated but here’s the gist

  • An expression becomes a closure/thunk, a bundling of code to compute the expressoin and the data it needs. These closure may depend on several arguments being supplied
  • We have a stack for arguments and another for continuations. A continuation is some code which takes the value returned from an expression and does something with it, like pattern match on it
  • To evaluate an expression we push the arguments it needs onto the stack and “enter” the corresponding closure, running the code in it
  • When the expression has evaluated itself it will pop the next continuation off the stack and give it the resulting value

During this portion of the compiler, we’d transform out last IL into a C-like language which actually works in terms of pushing, popping, and entering closures.

The key idea here that makes laziness work is that a closure defers work! It’s not a value, it’s a recipe for how to compute a value when we need it. Also note, all calls are tail calls since function calls are just a special case of entering a closure.

Another really beautiful idea in the STG machine is that closures evaluate themselves. This means closures present a uniform interface no matter what, all the details are hidden in that bundled up code. (I’m totally out of words to say this, but screw it it’s really cool).

Examples:

Code Generation

Finally, after converting to compiling STG machine we’re ready to output the target code. This bit is very dependent on what exactly we’re targeting.

If we’re targeting assembly, we have a few things to do. First, we have to switch from using variables to registers. This process is called register allocation and we basically slot each variable into an available register. If we run out, we store variables in memory and load it in as we need it.

In addition to register allocation, we have to compile those C-like language constructs to assembly. This means converting procedures into a label and some instructions, pattern matches into something like a jump table and so on. This is also where we’d apply low-level, bit-twiddling optimizations.

Examples:

Conclusion

Okay, clock off.

Hopefully that was helpful even if you don’t care that much about lazy languages (most of these ideas apply in any compiler). In particular, I hope that you now believe me when I say that lazy languages aren’t magical. In fact, the worry of how to implement laziness only really came up in one section of the compiler!

Now I have a question for you dear reader, what should I elaborate on? With summer ahead, I’ll have some free time soon. Is there anything else that you would like to see written about? (Just not parsing please)

<script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus
Categories: Offsite Blogs

New Functional Programming Job Opportunities

haskell-cafe - Mon, 05/18/2015 - 5:00pm
Here are some functional programming job opportunities that were posted recently: Senior Software Engineer (Functional Programming/Scala) at AdAgility http://functionaljobs.com/jobs/8826-senior-software-engineer-functional-programming-scala-at-adagility Cheers, Sean Murphy FunctionalJobs.com
Categories: Offsite Discussion

Philip Wadler: Royal Navy whistleblower says Trident is "a disaster waiting to happen"

Planet Haskell - Mon, 05/18/2015 - 4:14pm
A Royal Navy weapons expert who served on HMS Victorious from January to April this year has released via WikiLeaks an eighteen-page report claiming Trident is "a disaster waiting to happen".

McNeil's report on WikiLeaks.

Original report in The Sunday Herald.
McNeilly's report alleges 30 safety and security flaws on Trident submarines, based at Faslane on the Clyde. They include failures in testing whether missiles could be safely launched, burning toilet rolls starting a fire in a missile compartment, and security passes and bags going unchecked.
He also reports alarms being muted because they went off so often, missile safety procedures being ignored and top secret information left unguarded.The independent nuclear submarine expert, John Large, concluded McNeilly was credible, though he may have misunderstood some of the things he saw.
Large said: "Even if he is right about the disorganisation, lack of morale, and sheer foolhardiness of the personnel around him - and the unreliability of the engineered systems - it is likely that the Trident system as a whole will tolerate the misdemeanours, as it's designed to do." (Regarding the quote from Large, I'm less sanguine. Ignoring alarms is a standard prelude to disaster. See Normal Accidents.)

Second report in The National.
“We are so close to a nuclear disaster it is shocking, and yet everybody is accepting the risk to the public,” he warned. “It’s just a matter of time before we’re infiltrated by a psychopath or a terrorist.”Coverage in CommonSpace.


Categories: Offsite Blogs

I came up with a different kind of Prime Sieve

Haskell on Reddit - Mon, 05/18/2015 - 3:35pm

I had an idea for a prime sieve that finds prime numbers by looking for Twin Primes. I wrote an article about how it works on my site: A different kind of prime sieve. I put together a quick Haskell implementation and chucked it on BitBucket.

I wrote two versions of the algorithm, a purely functional infinite list and an imperative version using a pair of STUArrays. This version comes with a handicap, in that to find the nth prime I need to set a high upper bound on the array, without knowing how many composites will need to be crossed off. That upper bound is based on (n * log n) + (n * (log ((log n) - 0.9385))) as the likely max value of the nth prime.

In my not very scientific tests, the ST version slightly outperforms the prime sieve found in the primes module. I made a Main module which runs either primes, the pure twin sieve and the ST twin sieve if you want to compare it yourself.

I've only been using Haskell for a little over a year, and 6 months ago I started learning maths (abstract algebra namely) for the first time since high school. Haskell ignited an interest in math that school had long since extinguished. My day-to-day job is as a UX consultant, and if I write any code at all its only ever JavaScript. So this is my first attempt at writing an "algorithm" of any kind.

I was hoping for some feedback from /r/haskell about the algorithm itself, as well as the pure/ST implementations.

I also wanted to have a discussion about using efficient imperative algorithms in pure code. The ST sieve is by necessity strict so has some memoization drawbacks but is much more efficient. I think that sometimes having an optimized strict algorithm inside otherwise pure/lazy code can be a real advantage. Though it's something I haven't noticed very much of in Haskell packages. I wonder why we don't see more of this in favour of the "List all of the things" approach?

submitted by rick_dz
[link] [18 comments]
Categories: Incoming News

Docker Haskell dev environment?

Haskell on Reddit - Mon, 05/18/2015 - 2:58pm

Has anyone built a lightweight haskell dev environment in Docker? Would anyone else find this useful?

submitted by dstcruz
[link] [18 comments]
Categories: Incoming News

data EitherF m1 m2 a = LeftF (m1 a) | RightF (m2 a)

haskell-cafe - Mon, 05/18/2015 - 1:32pm
Hi, I am wondering if there is a library with a datatype like the following defined: data EitherF m1 m2 a = LeftF (m1 a) | RightF (m2 a) I could not find anything with Hoogle. Does anybody know such a thing? Thank you & best, Daniel
Categories: Offsite Discussion