News aggregator

wren gayle romano: ANN: containers 0.5.9.1

Planet Haskell - Sun, 01/08/2017 - 3:25pm
containers 0.5.9.1

The containers package contains efficient general-purpose implementations of various basic immutable container types. The declared cost of each operation is either worst-case or amortized, but remains valid even if structures are shared.

Changes since 0.5.8.1 (2016-08-31)

The headline change is adding merge and mergeA for Data.IntMap. The versions for Data.Map were introduced in 0.5.8.1, so this change restores parity between the interfaces. With this in place we hope this version will make it into GHC 8.2.

Other changes include:

  • Add instances for Data.Graph.SCC: Foldable, Traversable, Data, Generic, Generic1, Eq, Eq1, Show, Show1, Read, and Read1.
  • Add lifted instances (from Data.Functor.Classes) for Data.Sequence, Data.Map, Data.Set, Data.IntMap, and Data.Tree. (Thanks to Oleg Grenrus for doing a lot of this work.)
  • Properly deprecate functions in Data.IntMap long documented as deprecated.
  • Rename several internal modules for clarity. Thanks to esoeylemez for starting this process.
  • Make Data.Map.fromDistinctAscList and Data.Map.fromDistinctDescList more eager, improving performance.
  • Plug space leaks in Data.Map.Lazy.fromAscList and Data.Map.Lazy.fromDescList by manually inlining constant functions.
  • Add lookupMin and lookupMax to Data.Set and Data.Map as total alternatives to findMin and findMax.
  • Add (!?) to Data.Map as a total alternative to (!).
  • Avoid using deleteFindMin and deleteFindMax internally, preferring total functions instead. New implementations of said functions lead to slight performance improvements overall.

Links Twitter Facebook Google+ Tumblr WordPress

comments
Categories: Offsite Blogs

Chris Smith: Call for interest: Haskell in middle school math education

Planet Haskell - Sun, 01/08/2017 - 12:52am

Just a pointer to this post in haskell-cafe: Call for interest: Haskell in middle school math education

The TL;DR version is that a few people have put together a sizable budget to make the next big push to get CodeWorld and Haskell into middle school mathematics.  We’re looking to produce high-quality resources like video, study materials, etc. to enable teachers to easily use Haskell to make mathematics more tangible and creative in their classrooms for students ages about 11 to 14.  If this interests you, read the announcement!


Categories: Offsite Blogs

Dan Piponi (sigfpe): Addressing Pieces of State with Profunctors

Planet Haskell - Sat, 01/07/2017 - 3:46pm

Attempted segue

Since I first wrote about profunctors there has been quite a bit of activity in the area so I think it's about time I revisited them. I could just carry on from where I left off 5 years ago but there have been so many tutorials on the subject that I think I'll have to assume you've looked at them. My favourite is probably Phil Freeman's Fun with Profunctors. What I intend to do here is solve a practical problem with profunctors.



The problem

Arrows are a nice mechanism for building circuit-like entities in code. In fact, they're quite good for simulating electronic circuits. Many circuits are very much like pieces of functional code. For example an AND gate like this



can be nicely modelled using a pure function: c = a && b. But some components, like flip-flops, have internal state. What comes out of the outputs isn't a simple function of the inputs right now, but depends on what has happened in the past. (Alternatively you can take the view that the inputs and outputs aren't the current values but the complete history of the values.)


We'll use (Hughes) arrows rather than simple functions. For example, one kind of arrow is the Kleisli arrow. For the case of Kleisli arrows built from the state monad, these are essentially functions of type a -> s -> (b, s) where s is our state. We can write these more symmetrically as functions of type (a, s) -> (b, s). We can think of these as "functions" from a to b where the output is allowed to depend on some internal state s. I'll just go ahead and define arrows like this right now.


First the extensions and imports:



> {-# OPTIONS -W #-}
> {-# LANGUAGE Arrows #-}
> {-# LANGUAGE RankNTypes #-}
> {-# LANGUAGE FlexibleInstances #-}



> import Prelude hiding ((.), id)
> import Control.Arrow
> import Control.Category
> import Data.Profunctor
> import Data.Tuple



And now I'll define our stateful circuits. I'm going to make these slightly more general than I described allowing circuits to change the type of their state:



> newtype Circuit s t a b = C { runC :: (a, s) -> (b, t) }



> instance Category (Circuit s s) where
> id = C id
> C f . C g = C (f . g)



> instance Arrow (Circuit s s) where
> arr f = C $ \(a, s) -> (f a, s)
> first (C g) = C $ \((a, x), s) -> let (b, t) = g (a, s)
> in ((b, x), t)



This is just a more symmetrical rewrite of the state monad as an arrow. The first method allows us to pass through some extra state, x, untouched.


Now for some circuit components. First the "pure" operations, a multiplier and a negater:



> mul :: Circuit s s (Int, Int) Int
> mul = C $ \((x, y), s) -> (x*y, s)



> neg :: Circuit s s Int Int
> neg = C $ \(x, s) -> (-x, s)



And now some "impure" ones that read and write some registers as well as an accumulator:



> store :: Circuit Int Int Int ()
> store = C $ \(x, _) -> ((), x)



> load :: Circuit Int Int () Int
> load = C $ \((), s) -> (s, s)



> accumulate :: Circuit Int Int Int Int
> accumulate = C $ \(a, s) -> (a, s+a)



I'd like to make a circuit that has lots of these components, each with its own state. I'd like to store all of these bits of state in a larger container. But that means that each of these components needs to have a way to address its own particular substate. That's the problem I'd like to solve.



Practical profunctor optics

In an alternative universe lenses were defined using profunctors. To find out more I recommend Phil Freeman's talk that I linked to above. Most of the next paragraph is just a reminder of what he says in that talk and I'm going to use the bare minimum to do the job I want.


Remember that one of the things lenses allow you to do is this: suppose we have a record s containing a field of type a and another similar enough kind of record t with a field of type b. Among other things, a lens gives a way to take a rule for modifying the a field to a b field and extend it to a way to modify the s record into a t record. So we can think of lenses as giving us functions of type (a -> b) -> (s -> t). Now if p is a profunctor then you can think of p a b as being a bit function-like. Like functions, profunctors typically (kinda, sorta) get used to consume (zero or more) objects of type a and output (zero or more) objects of type b. So it makes sense to ask our lenses to work with these more general objects too, i.e. we'd like to be able to get something of type p a b -> p s t out of a lens. A strong profunctor is one that comes pre-packed with a lens that can do this for the special case where the types s and t are 2-tuples. But you can think of simple records as being syntactic sugar for tuples of fields, so strong profunctors also automatically give us lenses for records. Again, watch Phil's talk for details.


So here is our lens type:



> type Lens s t a b = forall p. Strong p => p a b -> p s t



Here are lenses that mimic the well known ones from Control.Lens:



> _1 :: Lens (a, x) (b, x) a b
> _1 = first'



> _2 :: Lens (x, a) (x, b) a b
> _2 = dimap swap swap . first'



(Remember that dimap is a function to pre- and post- compose a function with two others.)


Arrows are profunctors. So Circuit s s, when wrapped in WrappedArrow, is a profunctor. So now we can directly use the Circuit type with profunctor lenses. This is cool, but it doesn't directly solve our problem. So we're not going to use this fact. We're interested in addressing the state of type s, not the values of type a and b passed through our circuits. In other words, we're interested in the fact that Circuit s t a b is a profunctor in s and t, not a and b. To make this explicit we need a suitable way to permute the arguments to Circuit:



> newtype Flipped p s t a b = F { unF :: p a b s t }



(It was tempting to call that ComedyDoubleAct.)


And now we can define:



> instance Profunctor (Flipped Circuit a b) where
> lmap f (F (C g)) = F $ C $ \(a, s) -> g (a, f s)
> rmap f (F (C g)) = F $ C $ \(a, s) -> let (b, t) = g (a, s)
> in (b, f t)



> instance Strong (Flipped Circuit a b) where
> first' (F (C g)) = F $ C $ \(a, (s, x)) -> let (b, t) = g (a, s)
> in (b, (t, x))



Any time we want to use this instance of Profunctor with a Circuit we have to wrap everything with F and unF. The function dimap gives us a convenient way to implement such wrappings.


Let's implement an imaginary circuit with four bits of state in it.



Here is the state:



> data CPU = CPU { _x :: Int, _y :: Int, _z :: Int, _t :: Int } deriving Show



As I don't have a complete profunctor version of a library like Control.Lens with its template Haskell magic I'll set things up by hand. Here's a strong-profunctor-friendly version of the CPU and a useful isomorphism to go with it:



> type ExplodedCPU = (Int, (Int, (Int, Int)))



> explode :: CPU -> ExplodedCPU
> explode (CPU u v w t) = (u, (v, (w, t)))



> implode :: ExplodedCPU -> CPU
> implode (u, (v, (w, t))) = CPU u v w t



And now we need adapters that take lenses for an ExplodedCPU and (1) apply them to a CPU the way Control.Lens would...



> upgrade :: Profunctor p =>
> (p a a -> p ExplodedCPU ExplodedCPU) ->
> (p a a -> p CPU CPU)
> upgrade f = dimap explode implode . f



> x, y, z, t :: Flipped Circuit a b Int Int -> Flipped Circuit a b CPU CPU
> x = upgrade _1
> y = upgrade $ _2 . _1
> z = upgrade $ _2 . _2 . _1
> t = upgrade $ _2 . _2 . _2



...and (2) wrap them so they can be used on the flipped profunctor instance of Circuit:



> (!) :: p s t a b -> (Flipped p a b s t -> Flipped p a b s' t') ->
> p s' t' a b
> x ! f = dimap F unF f x



After all that we can now write a short piece of code that represents our circuit. Notice how we can apply the lenses x, ..., t directly to our components to get them to use the right pieces of state:



> test :: Circuit CPU CPU () ()
> test = proc () -> do
> a <- load ! x -< ()
> b <- load ! y -< ()
> c <- mul -< (a, b)
> d <- neg -< c
> e <- accumulate ! t -< d
> () <- store ! z -< e



> returnA -< ()



> main :: IO ()
> main = do
> print $ runC test ((), CPU 2 30 400 5000)



Of course with a suitable profunctor lens library you can do a lot more, like work with traversable containers of components.


Note that we could also write a version of all this code using monads instead of arrows. But it's easier to see the symmetry in Flipped Circuit when using arrows, and it also sets the scene for the next thing I want to write about...

Categories: Offsite Blogs

FP Complete: Green Threads are like Garbage Collection

Planet Haskell - Fri, 01/06/2017 - 10:00am
<html>

Many common programming languages today eschew manual memory management in preference to garbage collection. While the former certainly has its place in certain use cases, I believe the majority of application development today occurs in garbage collected languages, and for good reason. Garbage collection moves responsibility for many memory issues from the developer into the language's runtime, mostly removing the possibility of entire classes of bugs while reducing cognitive overhead. In other words, it separates out a concern. That's not to say that garbage collection is perfect or appropriate in all cases, but in many common cases it greatly simplifies code.

Languages like Haskell, Erlang, and Go provide a similar separation-of-concern in the form of green threads. Instead of requiring the developer to manually deal with asynchronous (or non-blocking) I/O calls, the runtime system takes responsibility for this. Like garbage collection, it's not appropriate for all use cases, but for many common cases it's a great fit and reduces code complexity. This post will discuss what green threads are, the problems they solve, some cases where they aren't the best fit, and how to get started with green thread based concurrency easily.

If you want to jump to that last point, you can download the Stack build tool and start with the async library tutorial.

Blocking and non-blocking I/O

Suppose you're writing a web server. A naive approach would be to write something like the following psuedo-code:

function webServer(port) { var listenSocket = bindSocket(port); while(true) { var socket = accept(listenSocket); forkThread(handleConnection(socket)); } } function handleConnection(socket) { while(true) { var bytes = read(socket); var request = parseRequest(bytes); var response = getResponse(request); write(socket, renderResponse(response)); } }

The read and write calls appear to perform blocking I/O, which means that the entire system thread running them will be blocked on the kernel until data is available. Depending on what our forkThread call did, this could mean one of two things:

  • If forkThread forks a new system thread, then performing blocking I/O isn't a problem: that thread has nothing to do until read and write complete. However, forking a new system thread for each connection is expensive, and does not scale well to hundreds of thousands of concurrent requests.
  • If forkThread forks a new thread within your runtime (sometimes called a fiber), then multiple fibers will all be running on a single system thread, and each time you make a blocking I/O call, the entire thread will be blocked, preventing any progress on other connections. You've essentially reduced your application to handling one connection at a time.

Neither of these approaches is very attractive for writing robust concurrent applications (though the former is certainly better than the latter). Another approach is to use non-blocking I/O. In this case, instead of making a call to read or write which blocks until data is available, you make a call and provide a callback function or continuation to handle what to do with the result data. Let's see what our web server above will look like:

function webServer(port) { var listenSocket = bindSocket(port); listenLoop(listenSocket); } function listenLoop(listenSocket) { acceptAsync(listenSocket, function(socket) { handleConnection(socket); listenLoop(listenSocket); }); } function handleConnection(socket) { readAsync(socket, function(bytes) { var request = parseRequest(bytes); var response = getResponse(request); writeAsync(socket, renderResponse(response), function() { handleConnection(socket); }); }); }

Let's note some differences:

  • We're no longer performing any forking. All actions are occuring in one single thread, removing the possibility of overhead from spawning a system thread (or even a fiber).
  • Instead of capturing the output of read in a variable bytes, we provide readAsync a callback function, and that callback function is provided the bytes value when available. We sometimes call these callbacks continuations, since they tell us how to continue processing from where you left off.
  • The loops are gone. Instead, our callbacks recursively call functions to create the necessary infinite looping, while allowing for proper asynchronous behavior.

This approach solves the problems listed above with blocking I/O: no performance overhead of spawning threads or fibers, and multiple requests can be processed concurrently without being blocked by each other's I/O calls.

The downsides of non-blocking I/O

Unfortunately, this new style does not get away scot-free.

  • Subjectively, the callback-style of coding is not as elegant as the blocking style. There are workarounds for this with techniques like promises.
  • Error/exception handling isn't as straight-forward in the callback setup as with the blocking code. It's certainly a solvable problem, but often involves techniques like passing in an extra callback function for the error case. Many languages out there today use runtime exceptions, and they don't translate too well to callbacks.
  • Our code is limited to running on one CPU core, at least in the simplest case. You can work around this with techniques like prefork, but it's not automatically handled by the callback approach. If our goal is to maximize requests per second, using every available CPU core for processing is definitely desired.
  • It's still possible for handling of multiple requests to cause blockage for each other. For example, if our parseRequest or renderResponse functions perform any blocking I/O, or use a significant amount of CPU time, other requests will need to wait until that processing finishes before they can resume their processing.

For those interested, we had a previous blog post on Concurrency and Node which delved more deeply into these points.

Making non-blocking a runtime system concern

Let's deliver on our promise from the introduction: turning non-blocking I/O into a runtime system concern. The theory behind this is:

  • Blocking I/O calls are conceptually easier to think about and work with
  • While spawning lightweight threads/fibers does entail some overhead, it's lightweight enough to be generally acceptable
  • If we can limit the amount of CPU time spent in running each fiber, we won't need to worry about high-CPU processing blocking other requests
  • The runtime system can handle the scheduling of lightweight threads onto separate CPU cores (via separate system threads), which will result in the ability to saturate the entire CPU without techniques like prefork

That may sound like a lot to deliver on, but green threads are up to the challenge. They are very similar to the fibers that we described above, with one major difference: seemingly blocking I/O calls actually use non-blocking I/O under the surface. Let's see how this would work with our web server example from above (copied here for convenience):

function webServer(port) { var listenSocket = bindSocket(port); while(true) { var socket = accept(listenSocket); forkThread(handleConnection(socket)); } } function handleConnection(socket) { while(true) { var bytes = read(socket); var request = parseRequest(bytes); var response = getResponse(request); write(socket, renderResponse(response)); } }

Starting from after the bindSocket call:

  1. Our main green thread calls accept. The runtime system essentially rewrites this to an acceptAsync call like our callback version, puts the main green thread to sleep, and has the runtime system's event loop schedule a wakeup when new data is available on the listenSocket.
  2. When a new connection is available, the runtime system wakes up the main green thread with the new socket value filled in. Our thread then forks a new green thread (let's call it worker 1) running the handleConnection call.
  3. Inside worker 1, our read call is similarly rewritten to readAsync with a callback, and the runtime system puts worker 1 to sleep and schedules a wakeup when data is available on socket.
  4. The runtime system then goes back to the list of green threads that have work to do and finds the main thread. The main thread continues from the forkThread call, and iterates on the while loop. It arrives back at the accept call and, like before, the runtime system puts the main thread to sleep and schedules a wakeup when there's a connection available on listenSocket.
  5. Importantly, at this point, the entire application is able to simply wait for some data. We're waiting for the operating system to tell us that either listenSocket or socket have some data available. When the operating system tells us (via a system call like epoll or select) that data is available, the runtime system can wake the relevant thread up, and do some more work until the next I/O call that puts it to sleep.
  6. Since the runtime system is handling the scheduling of threads, it is free to determine that a thread has been active for too long and pause execution in favor of a different thread, solving the long CPU processing problem mentioned above. (This is also the difference between cooperative and preemptive multithreading.)
  7. Instead of needing a separate error handling mechanism for callbacks, the runtime system can reuse existing exception throwing mechanisms from the language. For example, if an error occurs during the read call, the runtime system can throw an exception from that point in the code.

I'd argue that this green thread approach - for the most part - gives us the best of both worlds: the high level, easy to read and write, robust code that comes from using threads, with the high performance of the non-blocking callback code.

Downsides of green threads

Like garbage collection, there are downsides to green threads as well. While not a comprehensive list, here are some such downsides I'm aware of:

  • By passing control of scheduling to the runtime system, we lose control of when exactly a context switch will occur, which can result in performance degradation. (Note that this only applies in relation to the async approach; the other thread-based approaches also have the context switch issue.) In my experience, this is rarely a problem, though certain performance-sensitive code bases may be affected by this. In a distributed computation project at FP Complete, for example, we ultimately went the route of creating our own event loop.
  • As mentioned, spawning a green thread is cheap, but not free. An event loop can bypass this overhead. Again, with most green thread systems, the overhead is small enough so as not to be prohibitive, but if the highest performance possible is your goal, green threads may ultimately be your bottleneck.

As you can see, like garbage collection, the main downside is that for specific performance cases, green threads may be an impediment. But also like garbage collection, there is a wide array of cases where the gains in code simplicity and lower bug rates more than pay for the slight performance overhead.

Experiment with green threads

Let's go ahead and get started with a short example right now. The only tools you'll need are the Stack build tool and a text editor. If you're on a Mac or Linux system, you can get Stack by running curl -sSL https://get.haskellstack.org/ | sh. On Windows, you probably want the 64-bit installer.

Once you have Stack installed, save the following code into a file called echo.hs:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc --package conduit-extra {-# LANGUAGE OverloadedStrings #-} import Data.Conduit import Data.Conduit.Network import qualified Data.ByteString.Char8 as B8 main = do putStrLn "OK, I'm running!" -- This automatically binds a new listening socket and forks a new -- green thread for each incoming connection. runTCPServer settings app where -- Listen on all interfaces on port 4500 settings = serverSettings 4500 "*" -- Create a simple pipeline connecting the input from the network -- (appSource) to our echo program to the output to the network -- (appSink). app appData = runConduit (appSource appData .| echo .| appSink appData) -- awaitForever keeps running the inner function as long as new data -- is available from the client echo = awaitForever (\bytes -> do -- Echo back the raw message we received yield bytes -- And now send back the Fibonacci value at the length of the -- input. We need to use B8.pack to convert our String into a -- binary format we can send over the network. yield (B8.pack (show (fib (B8.length bytes)) ++ "\n"))) -- Written badly for high CPU usage! fib 0 = 1 fib 1 = 1 fib n = fib (n - 1) + fib (n - 2)

Now you can run this program with stack echo.hs. The first run will take a bit of time as it downloads a compiler and a number of libraries. This will only happen on the first run; subsequent runs will reuse the previously downloaded and installed tools and libraries. Once you've got this running, you can connect to it to play with:

$ telnet localhost 4500

Go ahead and play with it with some short lines, and confirm that it responds. For example, here's a sample session:

$ telnet localhost 4500 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Hello world! Hello world! 610 Bye! Bye! 13

Now to prove our claims of the system remaining responsive in the presence of high CPU: try entering a long string, which will require a lot of CPU time to calculate the Fibonacci value, e.g.:

This is a fairly long string which will take a bit of time unless you have a supercomputer. This is a fairly long string which will take a bit of time unless you have a supercomputer.

As you might expect, further interactions on this connection will have no effect as it is computing its response. But go ahead and open up a new telnet session in a different terminal. You should be able to continue interacting with the echo server and get results, thanks to the scheduling behavior of the runtime system. Notice how we get this behavior without any explicit work on our part to break up the expensive CPU operation into smaller bits!

EDIT As pointed out on lobste.rs, the above will not expand to multiple cores, since GHC's interpreter mode will only use a single core. In order to see this take advantage of additional CPU cores for additional requests, first compile the executable with:

stack ghc --resolver lts-7.14 --install-ghc --package conduit-extra -- --make -threaded echo.hs

And then run it with:

./echo +RTS -N4

The -N4 tells the GHC runtime to use 4 cores.

Learn more

There are great libraries in Haskell to take advantage of green threads for easy and beautiful concurrency. My all time favorite is the async package. Paired with powerful abstractions like Software Transactional Memory, you can quickly whip up high-performance, concurrent network services while avoiding common pitfalls like deadlocks and race conditions.

If you or your team are interested in learning more about how functional programming and Haskell, check out our training options and our Haskell syllabus. If you'd like to learn about how FP Complete can help you with your server applications needs, contact us about our consulting.

Contact FP Complete

EmailNameMessageContact FP Complete

Or you can email us at consulting@fpcomplete.com

</html>
Categories: Offsite Blogs

Joachim Breitner: TikZ aesthetics

Planet Haskell - Fri, 01/06/2017 - 9:08am

Every year since 2012, I typeset the problems and solutions for the German math event Tag der Mathematik, which is organized by the Zentrum für Mathematik and reaches 1600 students from various parts of Germany. For that, I often reach to the LaTeX drawing package TikZ, and I really like the sober aesthetics of a nicely done TikZ drawing. So mostly for my own enjoyment, I collect the prettiest here.

On a global scale they are still rather mundane, and for really impressive and educating examples, I recommend the TikZ Gallery.

Categories: Offsite Blogs

Dominic Steinitz: UK / South Korea Trade: A Bayesian Analysis

Planet Haskell - Thu, 01/05/2017 - 6:09am
Introduction

I was intrigued by a tweet by the UK Chancellor of the Exchequer stating "exports [to South Korea] have doubled over the last year. Now worth nearly £11bn” and a tweet by a Member of the UK Parliament stating South Korea "our second fastest growing trading partner". Although I have never paid much attention to trade statistics, both these statements seemed surprising. But these days it’s easy enough to verify such statements. It’s also an opportunity to use the techniques I believe data scientists in (computer) game companies use to determine how much impact a new feature has on the game’s consumers.

One has to be slightly careful with trade statistics as they come in many different forms, e.g., just goods or goods and services etc. When I provide software and analyses to US organisations, I am included in the services exports from the UK to the US.

Let’s analyse goods first before moving on to goods and services.

Getting the Data

First let’s get hold of the quarterly data from the UK Office of National Statistics.

ukstats <- "https://www.ons.gov.uk" bop <- "economy/nationalaccounts/balanceofpayments" ds <- "datasets/tradeingoodsmretsallbopeu2013timeseriesspreadsheet/current/mret.csv" mycsv <- read.csv(paste(ukstats,"file?uri=",bop,ds,sep="/"),stringsAsFactors=FALSE)

Now we can find the columns that refer to Korea.

ns <- which(grepl("Korea", names(mycsv))) length(ns) ## [1] 3 names(mycsv[ns[1]]) ## [1] "BoP.consistent..South.Korea..Exports..SA................................" names(mycsv[ns[2]]) ## [1] "BoP.consistent..South.Korea..Imports..SA................................" names(mycsv[ns[3]]) ## [1] "BoP.consistent..South.Korea..Balance..SA................................"

Now we can pull out the relevant information and create a data frame of it.

korean <- mycsv[grepl("Korea", names(mycsv))] imports <- korean[grepl("Imports", names(korean))] exports <- korean[grepl("Exports", names(korean))] balance <- korean[grepl("Balance", names(korean))] df <- data.frame(mycsv[grepl("Title", names(mycsv))], imports, exports, balance) colnames(df) <- c("Title", "Imports", "Exports", "Balance") startQ <- which(grepl("1998 Q1",df$Title)) endQ <- which(grepl("2016 Q3",df$Title)) dfQ <- df[startQ:endQ,]

We can now plot the data.

tab <- data.frame(kr=as.numeric(dfQ$Exports), krLabs=as.numeric(as.Date(as.yearqtr(dfQ$Title,format='%Y Q%q')))) ggplot(tab, aes(x=as.Date(tab$krLabs), y=tab$kr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Quarterly)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)")

For good measure let’s plot the annual data.

startY <- grep("^1998$",df$Title) endY <- grep("^2015$",df$Title) dfYear <- df[startY:endY,] tabY <- data.frame(kr=as.numeric(dfYear$Exports), krLabs=as.numeric(dfYear$Title)) ggplot(tabY, aes(x=tabY$krLabs, y=tabY$kr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Annual)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)")

And the monthly data.

startM <- grep("1998 JAN",df$Title) endM <- grep("2016 OCT",df$Title) dfMonth <- df[startM:endM,] tabM <- data.frame(kr=as.numeric(dfMonth$Exports), krLabs=as.numeric(as.Date(as.yearmon(dfMonth$Title,format='%Y %B')))) ggplot(tabM, aes(x=as.Date(tabM$krLabs), y=tabM$kr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Monthly)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)")

It looks like some change took place in 2011 but nothing to suggest either that "export have doubled over the last year" or that South Korea is "our second fastest growing partner". That some sort of change did happen is further supported by the fact a Free Trade Agreement between the EU and Korea was put in place in 2011.

But was there really a change? And what sort of change was it? Sometimes it’s easy to imagine patterns where there are none.

With this warning in mind let us see if we can get a better feel from the numbers as to what happened.

The Model

Let us assume that the data for exports are approximated by a linear function of time but that there is a change in the slope and the offset at some point during observation.

Since we are going to use stan to infer the parameters for this model and stan cannot handle discrete parameters, we need to marginalize out this (discrete) parameter. I hope to do the same analysis with LibBi which seems more suited to time series analysis and which I believe will not require such a step.

Setting D = {yi}i = 1N we can calculate the likelihood

stan operates on the log scale and thus requires the log likelihood

where

and where the log sum of exponents function is defined by

The log sum of exponents function allows the model to be coded directly in Stan using the built-in function , which provides both arithmetic stability and efficiency for mixture model calculations.

Stan

Here’s the model in stan. Sadly I haven’t found a good way of divvying up .stan files in a .Rmd file so that it still compiles.

data { int<lower=1> N; real x[N]; real y[N]; } parameters { real mu1; real mu2; real gamma1; real gamma2; real<lower=0> sigma1; real<lower=0> sigma2; } transformed parameters { vector[N] log_p; real mu; real sigma; log_p = rep_vector(-log(N), N); for (tau in 1:N) for (i in 1:N) { mu = i < tau ? (mu1 * x[i] + gamma1) : (mu2 * x[i] + gamma2); sigma = i < tau ? sigma1 : sigma2; log_p[tau] = log_p[tau] + normal_lpdf(y[i] | mu, sigma); } } model { mu1 ~ normal(0, 10); mu2 ~ normal(0, 10); gamma1 ~ normal(0, 10); gamma2 ~ normal(0, 10); sigma1 ~ normal(0, 10); sigma2 ~ normal(0, 10); target += log_sum_exp(log_p); } generated quantities { int<lower=1,upper=N> tau; tau = categorical_rng(softmax(log_p)); }

The above, although mimicking our mathematical model, has quadratic complexity and we can use the trick in the stan manual to make it linear albeit with less clarity.

data { int<lower=1> N; real x[N]; real y[N]; } parameters { real mu1; real mu2; real gamma1; real gamma2; real<lower=0> sigma1; real<lower=0> sigma2; } transformed parameters { vector[N] log_p; { vector[N+1] log_p_e; vector[N+1] log_p_l; log_p_e[1] = 0; log_p_l[1] = 0; for (i in 1:N) { log_p_e[i + 1] = log_p_e[i] + normal_lpdf(y[i] | mu1 * x[i] + gamma1, sigma1); log_p_l[i + 1] = log_p_l[i] + normal_lpdf(y[i] | mu2 * x[i] + gamma2, sigma2); } log_p = rep_vector(-log(N) + log_p_l[N + 1], N) + head(log_p_e, N) - head(log_p_l, N); } } model { mu1 ~ normal(0, 10); mu2 ~ normal(0, 10); gamma1 ~ normal(0, 10); gamma2 ~ normal(0, 10); sigma1 ~ normal(0, 10); sigma2 ~ normal(0, 10); target += log_sum_exp(log_p); } generated quantities { int<lower=1,upper=N> tau; tau = categorical_rng(softmax(log_p)); }

Let’s run this model with the monthly data.

NM <- nrow(tabM) KM <- ncol(tabM) yM <- tabM$kr XM <- data.frame(tabM,rep(1,NM))[,2:3] fitM <- stan( file = "lr-changepoint-ng.stan", data = list(x = XM$krLabs, y = yM, N = length(yM)), chains = 4, warmup = 1000, iter = 10000, cores = 4, refresh = 500, seed=42 ) ## Warning: There were 662 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See ## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup ## Warning: Examine the pairs() plot to diagnose sampling problems

Looking at the results below we see a multi-modal distribution so a mean is not of much use.

histData <- hist(extract(fitM)$tau,plot=FALSE,breaks=c(seq(1,length(yM),1))) histData$counts ## [1] 18000 0 0 0 0 0 0 0 0 0 0 ## [12] 0 0 0 0 0 0 0 0 0 0 0 ## [23] 0 0 0 0 0 0 0 0 0 0 0 ## [34] 0 0 0 0 0 0 0 0 0 0 0 ## [45] 0 0 0 0 0 0 0 0 0 0 0 ## [56] 0 0 0 0 0 0 0 0 0 0 0 ## [67] 0 0 0 0 0 0 0 0 0 0 0 ## [78] 0 0 0 0 0 0 0 0 0 0 0 ## [89] 0 0 0 0 0 0 0 0 0 0 0 ## [100] 0 0 0 0 0 0 0 0 0 0 0 ## [111] 0 0 0 0 0 0 0 1 4 12 16 ## [122] 16 107 712 8132 0 0 0 0 0 0 0 ## [133] 0 0 0 0 0 0 0 0 0 0 0 ## [144] 0 0 0 0 0 0 0 0 0 0 0 ## [155] 0 0 0 0 0 0 0 0 0 0 25 ## [166] 171 2812 0 0 0 0 0 0 0 0 0 ## [177] 0 0 0 0 0 0 0 0 0 0 0 ## [188] 0 0 0 0 0 0 0 0 0 0 0 ## [199] 0 0 0 0 0 0 0 0 0 0 0 ## [210] 0 0 0 0 0 0 0 0 0 0 0 ## [221] 0 0 0 0 5992

We can get a pictorial representation of the maxima so that the multi-modality is even clearer.

min_indexes = which(diff( sign(diff( c(0,histData$counts,0)))) == 2) max_indexes = which(diff( sign(diff( c(0,histData$counts,0)))) == -2) modeData = data.frame(x=1:length(histData$counts),y=histData$counts) min_locs = modeData[min_indexes,] max_locs = modeData[max_indexes,] plot(modeData$y, type="l") points( min_locs, col="red", pch=19, cex=1 ) points( max_locs, col="green", pch=19, cex=1 )

My interpretation is that the evidence (data) says there is probably no changepoint (a change at the beginning or end is no change) but there might be a change at intermediate data points.

We can see something strange (maybe a large single export?) happened at index 125 which translates to 2008MAY.

The mode at index 167 which translates to 2011NOV corresponds roughly to the EU / Korea trade agreement.

Let us assume that there really was a material difference in trade at this latter point. We can fit a linear regression before this point and one after this point.

Here’s the stan

data { int<lower=1> N; int<lower=1> K; matrix[N,K] X; vector[N] y; } parameters { vector[K] beta; real<lower=0> sigma; } model { y ~ normal(X * beta, sigma); }

And here’s the R to fit the before and after data. We fit the model, pull out the parameters for the regression and pull out the covariates

N <- length(yM) M <- max_locs$x[3] fite <- stan(file = 'LR.stan', data = list(N = M, K = ncol(XM), y = yM[1:M], X = XM[1:M,]), pars=c("beta", "sigma"), chains=3, cores=3, iter=3000, warmup=1000, refresh=-1) se <- extract(fite, pars = c("beta", "sigma"), permuted=TRUE) estCovParamsE <- colMeans(se$beta) fitl <- stan(file = 'LR.stan', data = list(N = N-M, K = ncol(XM), y = yM[(M+1):N], X = XM[(M+1):N,]), pars=c("beta", "sigma"), chains=3, cores=3, iter=3000, warmup=1000, refresh=-1) sl <- extract(fitl, pars = c("beta", "sigma"), permuted=TRUE) estCovParamsL <- colMeans(sl$beta)

Make predictions

linRegPredsE <- data.matrix(XM) %*% estCovParamsE linRegPredsL <- data.matrix(XM) %*% estCovParamsL ggplot(tabM, aes(x=as.Date(tabM$krLabs), y=tabM$kr)) + geom_line(aes(x = as.Date(tabM$krLabs), y = tabM$kr, col = "Actual")) + geom_line(data=tabM[1:M,], aes(x = as.Date(tabM$krLabs[1:M]), y = linRegPredsE[(1:M),1], col = "Fit (Before FTA)")) + geom_line(data=tabM[(M+1):N,], aes(x = as.Date(tabM$krLabs[(M+1):N]), y = linRegPredsL[((M+1):N),1], col = "Fit (After FTA)")) + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Monthly)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)")

An Intermediate Conclusion and Goods and Services (Pink Book)

So we didn’t manage to substantiate either the Chancellor’s claim or the Member of Parliament’s claim.

But it may be that we can if we look at Goods and Services then we might be able to see the numbers resulting in the claims.

pb <- "datasets/pinkbook/current/pb.csv" pbcsv <- read.csv(paste(ukstats,"file?uri=",bop,pb,sep="/"),stringsAsFactors=FALSE)

This has a lot more information albeit only annually.

pbns <- grep("Korea", names(pbcsv)) length(pbns) ## [1] 21 lapply(pbns,function(x) names(pbcsv[x])) ## [[1]] ## [1] "BoP..Current.Account..Goods...Services..Imports..South.Korea............" ## ## [[2]] ## [1] "BoP..Current.Account..Current.Transfer..Balance..South.Korea............" ## ## [[3]] ## [1] "BoP..Current.Account..Goods...Services..Balance..South.Korea............" ## ## [[4]] ## [1] "IIP..Assets..Total.South.Korea.........................................." ## ## [[5]] ## [1] "Trade.in.Services.replaces.1.A.B....Exports.Credits...South.Korea...nsa." ## ## [[6]] ## [1] "IIP...Liabilities...Total...South.Korea................................." ## ## [[7]] ## [1] "BoP..Total.income..Balance..South.Korea................................." ## ## [[8]] ## [1] "BoP..Total.income..Debits..South.Korea.................................." ## ## [[9]] ## [1] "BoP..Total.income..Credits..South.Korea................................." ## ## [[10]] ## [1] "BoP..Current.account..Balance..South.Korea.............................." ## ## [[11]] ## [1] "BoP..Current.account..Debits..South.Korea..............................." ## ## [[12]] ## [1] "BoP..Current.account..Credits..South.Korea.............................." ## ## [[13]] ## [1] "IIP...Net...Total....South.Korea........................................" ## ## [[14]] ## [1] "Trade.in.Services.replaces.1.A.B....Imports.Debits...South.Korea...nsa.." ## ## [[15]] ## [1] "BoP..Current.Account..Services..Total.Balance..South.Korea.............." ## ## [[16]] ## [1] "Bop.consistent..Balance..NSA..South.Korea..............................." ## ## [[17]] ## [1] "Bop.consistent..Im..NSA..South.Korea...................................." ## ## [[18]] ## [1] "Bop.consistent..Ex..NSA..South.Korea...................................." ## ## [[19]] ## [1] "Current.Transfers...Exports.Credits...South.Korea...nsa................." ## ## [[20]] ## [1] "Current.Transfers...Imports.Debits...South.Korea...nsa.................." ## ## [[21]] ## [1] "BoP..Current.Account..Goods...Services..Exports..South.Korea............"

Let’s just look at exports.

koreanpb <- pbcsv[grepl("Korea", names(pbcsv))] exportspb <- koreanpb[grepl("Exports", names(koreanpb))] names(exportspb) ## [1] "Trade.in.Services.replaces.1.A.B....Exports.Credits...South.Korea...nsa." ## [2] "Current.Transfers...Exports.Credits...South.Korea...nsa................." ## [3] "BoP..Current.Account..Goods...Services..Exports..South.Korea............"

The last column gives exports of Goods and Services so let’s draw a chart of it.

pb <- data.frame(pbcsv[grepl("Title", names(pbcsv))], exportspb[3]) colnames(pb) <- c("Title", "Exports") startpbY <- which(grepl("1999",pb$Title)) endpbY <- which(grepl("2015",pb$Title)) pbY <- pb[startpbY:endpbY,] tabpbY <- data.frame(kr=as.numeric(pbY$Exports), krLabs=as.numeric(pbY$Title)) ggplot(tabpbY, aes(x=tabpbY$krLabs, y=tabpbY$kr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods and Services Exports UK / South Korea (Annual)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)")

No joy here either to any of the claims. Still it’s been an interesting exercise.


Categories: Offsite Blogs

Michael Snoyman: Conflicting Module Names

Planet Haskell - Wed, 01/04/2017 - 6:00pm

It's the oldest open issue on the Stackage repo, and a topic I've discussed more times than I can remember over the years. Hackage enforces that package names are unique (so that no one else can claim the name conduit, for instance), but does nothing to ensure unique module names (so someone else could write a package named my-conduit with a module named Data.Conduit).

For the record, I think Hackage's position here is not only a good one, but the only logical one it could have made. I'm not even hinting at wanting to change that. Please don't read this blog post in that way at all.

Usually, conflicting module names do not negatively affect us, at least when working on project code with a proper .cabal file. In my made-up example above, I would explicitly state that I depend on conduit and not list my-conduit, and when my code imports Data.Conduit, Stack+Cabal+GHC can all work together to ensure that the correct module is used.

EDIT Since I've already written some of the code for stackage-curator to detect this, I generated a list of all conflicting module names to give an idea of what we're looking at.

The problem

(If you're already convinced that conflicting module names are a problem, you may want to skip straight to "the solution." This section is fairly long and detailed.)

Unfortunately, there are still some downsides to having the same module name appear in different packages:

  1. Documentation Suppose I'm reading a tutorial that includes the line import Control.Monad.Reader. I look at the Stackage doc list by module and discover:

    If I'm not familiar with the Haskell ecosystem, I'm unlikely to know that mtl is far more popular than monads-tf and choose the latter.

  2. runghc/ghci We're not always working on project code. Sometimes we're just writing a script. Sometimes we're playing with an idea in GHCi. What if I import System.FilePath.Glob in a GHCi prompt when I have both the filemanip and Glob packages installed?

  3. doctests Similar to the previous point: even when you run doctests from inside the context of a project, they don't typically know which packages can be used, and conflicting module names can cause the tests to fail. What's especially bad about this is that an unrelated action (like running stack build async-dejafu) can suddenly make your tests start to fail when they previously succeeded.

  4. Custom Setup.hs Suppose you're writing a cabal package that uses a custom Setup.hs file and imports some additional modules. To pick a concrete example that just happened: the executable-hash package has a Setup.hs file which - indirectly - imports Crypto.Hash.SHA1. And there's an explicit dependency on cryptohash in the .cabal file, which one may naively infer means we're safe. However, when uuid-1.3.13 moved from cryptonite to a few other packages (including cryptohash-sha1), building executable-hash when uuid was already installed became a build error. And like the previous point, this is essentially a non-deterministic race condition.

    Since I was a backup maintainer for executable-hash, I implemented two fixes: adding an explicit PackageImport and using the new custom-setup feature in Cabal-1.24. While custom-setup is definitely the way to go with this, and it's a great addition to Cabal, not everyone is using the newest version of Cabal, Stack is only just now adding support for this, and not all packages will update to support this immediately.

  5. Better tooling It would be great if tooling could automatically determine which packages to install based on the imports list, to avoid the need for a lot of manual and redundant statements of dependencies. We're considering doing this in the upcoming stack script command. But how will Stack know which Control.Monad.Reader to use?

The solution

While we know that we can't have fully unique module names without a lot of buy-in from package authors, we can get pretty close, with canonical locations for a module. We've already implemented this to some extent in Stackage to resolve problem (3) listed above. We now have the ability to list some packages as hidden in a Stackage snapshot. This means that, after installing the package, the Stackage build system will hide the package, so that its modules won't be available for import. By adding async-dejafu to the hidden list, the warp doctest suite no longer has the ambiguity issue when running.

After dealing with the cryptohash-sha1 fallout earlier this week, I realized that this solution can generalize to solve a large swath of the problems described above. Here's how I see it working:

  • We introduce a new constraint in the Stackage build process: every module name must be present in only one exposed (that is, non-hidden) package.
  • When stack build registers a package, it automatically hides it if the snapshot lists it as hidden.
  • On the stackage.org module list, modules from a hidden package are explicitly marked as hidden (or, if we want to be more extreme, we just hide them entirely).
  • With the upcoming stack script command, when finding a package for a given imported module, we only pay attention to non-hidden modules.

This doesn't fully solve the problems above. For example, if a user just Googles Control.Monad.Reader, they'll still possibly get confusing documentation. But I think this is a huge step in the right direction.

Categories: Offsite Blogs

Brent Yorgey: MonadRandom 0.5 released

Planet Haskell - Tue, 01/03/2017 - 4:32pm

I’m happy to announce the release of MonadRandom-0.5, a package which provides a convenient monadic interface for random number generation in the style of transformers and mtl: a Rand monad (essentially a state monad that threads through a generator), a monad transformer variant called RandT, and a RandomMonad class allowing the use of random generation operations in monad stacks containing RandT.

This release has quite a few small additions as well as a big module reorganization. However, thanks to module re-exports, most existing code using the library should continue to work with no changes; the major version bump reflects the large reorganization and my resultant inability to 100% guarantee that existing user code will not break. If your code does break, please let me know—I would be happy to help you fix it, or simply to know about it so I can help other users.

Here are a few of the biggest changes that may be of interest to users of the library:

  • A new MonadInterleave class (see #20), which is a big improvement over MonadSplit. It provides a method interleave :: m a -> m a, which works by splitting the generator, running its argument using one half of the generator, and using the other half as the final state of the resulting action (replacing whatever the final generator state otherwise would have been). This can be used, for example, to allow random computations to run in parallel, or to create lazy infinite structures of random values. In the example below, the infinite tree randTree cannot be evaluated lazily: even though it is cut off at two levels deep by hew 2, the random value in the right subtree still depends on generation of all the random values in the (infinite) left subtree, even though they are ultimately unneeded. Inserting a call to interleave, as in randTreeI, solves the problem: the generator splits at each Node, so random values in the left and right subtrees are generated independently.

    data Tree = Leaf | Node Int Tree Tree deriving Show hew :: Int -> Tree -> Tree hew 0 _ = Leaf hew _ Leaf = Leaf hew n (Node x l r) = Node x (hew (n-1) l) (hew (n-1) r) randTree :: Rand StdGen Tree randTree = Node <$> getRandom <*> randTree <*> randTree randTreeI :: Rand StdGen Tree randTreeI = interleave $ Node <$> getRandom <*> randTreeI <*> randTreeI >>> hew 2 <$> evalRandIO randTree Node 2168685089479838995 (Node (-1040559818952481847) Leaf Leaf) (Node ^CInterrupted. >>> hew 2 <$> evalRandIO randTreeI Node 8243316398511136358 (Node 4139784028141790719 Leaf Leaf) (Node 4473998613878251948 Leaf Leaf)
  • A new PrimMonad instance for RandT (thanks to Koz Ross), allowing it to be used in conjunction with e.g. mutable vectors.

  • New and improved random selection functions:
    • fromList now raises an error when the total weight of elements is zero.
    • The type of uniform is generalized to work over any Foldable.
    • New operations weighted, weightedMay, fromListMay, and uniformMay have been added. weighted is like fromList but generalized to work over any Foldable. The May variants, of course, return a Maybe result instead of raising an error.
  • New lazy vs strict variants of the Rand monad. If you import Control.Monad.Random or Control.Monad.Trans.Random you get the Lazy variant re-exported by default, but you can explicitly import .Lazy or .Strict if you want. They provide the exact same API, but Lazy is implemented with a lazy state monad and Strict with a strict one. To be honest it’s not clear what difference this might make, but since the distinction is already there with the underlying state monad for free, why not provide it?

Although there was some discussion of generalizing MonadRandom to work for a wider range of underlying generators (see the comments on my previous blog post and the discussion on issue #26), I decided to punt on that for now. It seems rather complicated, and that there are already good alternatives like the very nice random-fu package, so I decided to keep things simple for this release. I’m still open to proposals for generalizing future releases.

For a full rundown of changes in 0.5, see the change log. Comments, questions, and bug reports are always welcome either as a comment on this blog post or on the GitHub issue tracker.


Categories: Offsite Blogs

Diagrams: Diagrams 1.4

Planet Haskell - Mon, 01/02/2017 - 6:00pm

by Brent Yorgey on January 3, 2017

Tagged as: release, features, announcement, 1.4.

Diagrams 1.4

The diagrams team is very pleased to announce the release of diagrams 1.4. The release actually happened a few months ago, in October—we just hadn't gotten around to writing about it yet. But in any case this was a fairly quiet release, with very few breaking changes; mostly 1.4 just introduced new features. There is a migration guide which lists a few known potentially breaking changes, but most users should have no trouble. The rest of this post highlights some of the new features in 1.4.

Alignment and layoutRadial tree layout

The existing Diagrams.TwoD.Layout.Tree module from diagrams-contrib has been extended with a new radialLayout function, based on an algorithm by Andy Pavlo.

> import Diagrams.TwoD.Layout.Tree > import Data.Tree > > t = Node 'A' > [ Node 'B' (map lf "CDE") > , Node 'F' [Node 'G' (map lf "HIJKLM"), Node 'N' (map lf "OPQRS")] > , Node 'T' (map lf "UVWXYZ") > ] > where lf x = Node x [] > > example = > renderTree (\n -> (text (show n) # fontSizeG 0.5 > circle 0.5 # fc white)) > (~~) (radialLayout t) > # centerXY # frame 0.5Aligned composition

Sometimes, it is desirable to compose some diagrams according to a certain alignment, but without affecting their local origins. The composeAligned function can be used for this purpose. It takes as arguments an alignment function (such as alignT or snugL), a composition function of type [Diagram] -> Diagram, and produces a new composition function which works by first aligning the diagrams before composing them.

> example :: Diagram B > example = (hsep 2 # composeAligned alignT) (map circle [5,1,3,2]) > # showOrigin

The example above shows using hsep 2 to compose a collection of top-aligned circles. Notice how the origin of the composed diagram is still at the center of the leftmost circle, instead of at its top edge (where it would normally be placed by alignT).

Constrained layout

The new Diagrams.TwoD.Layout.Constrained module from diagrams-contrib implements basic linear constraint-based layout. As a simple example of something that would be tedious to draw without some kind of constraint solving, consider this diagram which consists of a vertical stack of circles of different sizes, along with an accompanying set of squares, such that (1) each square is constrained to lie on the same horizontal line as a circle, and (2) the squares all lie on a diagonal line.

> import Diagrams.TwoD.Layout.Constrained > import Control.Monad (zipWithM_) > > example :: Diagram B > example = frame 1 $ layout $ do > let rs = [2,2,4,3,5,6] > cirs <- newDias (map circle rs # fc blue) > sqs <- newDias (replicate (length rs) (square 2) # fc orange) > constrainWith vcat cirs > zipWithM_ sameY cirs sqs > constrainWith hcat [cirs !! 0, sqs !! 0] > along (direction (1 ^& (-1))) (map centerOf sqs)

See the package documentation for more examples and documentation.

Anchors

Another new module in diagrams-contrib, Diagrams.Anchors, provides a convenient interface for aligning diagrams relative to named anchor points. This can be useful, for example, when laying out diagrams composed of pieces that should "attach" to each other at various points.

We don't have an example of its use at the moment—if you play with it and create a nice example, let us know!

PathsBoolean path operations

The new Diagrams.TwoD.Path.Boolean module from diagrams-contrib contains functions for computing boolean combinations of paths, such as union, intersection, difference, and symmetric difference.

> import qualified Diagrams.TwoD.Path.Boolean as B > > thing1, thing2 :: Path V2 Double > thing1 = square 1 > thing2 = circle 0.5 # translate (0.5 ^& (-0.5)) > > example = hsep 0.5 . fc green . map strokePath $ > [ B.union Winding (thing1 thing2) > , B.intersection Winding thing1 thing2 > , B.difference Winding thing1 thing2 > , B.exclusion Winding thing1 thing2 > ]Cubic B-splines

Diagrams.CubicSpline has a new function, bspline, which creates a smooth curve (to be precise, a [uniform cubic B-spline](https://en.wikipedia.org/wiki/B-spline)) with the given points as control points. The curve begins and ends at the first and last points, and is tangent to the lines to the second-to-last control points. It does not, in general, pass through the intermediate control points.

> pts = map p2 (zip [0 .. 8] (cycle [0, 1])) > example = mconcat > [ bspline pts > , mconcat $ map (place (circle 0.1 # fc blue # lw none)) pts > ]

One major difference between cubicSpline and bspline is that the curves generated by cubicSpline depend on the control points in a global way—that is, changing one control point could alter the entire curve—whereas with bspline, each control point only affects a local portion of the curve.

Following composition

diagrams-contrib has a new module, Diagrams.TwoD.Path.Follow, which defines a wrapper type Following n. Following is just like Trail' Line V2, except that it has a different Monoid instance: following values are concatenated, just like regular lines, except that they are also rotated so the tangents match at the join point. In addition, they are normalized so the tangent at the start point is in the direction of the positive \(x\) axis (essentially we are considering trails equivalent up to rotation).

> import Control.Lens (ala) > import Diagrams.TwoD.Path.Follow > > wibble :: Trail' Line V2 Double > wibble = hrule 1 hrule 0.5 # rotateBy (1/6) hrule 0.5 # rotateBy (-1/6) a > where a = arc (xDir # rotateBy (-1/4)) (1/5 @@ turn) > # scale 0.7 > > example = > [ wibble > , wibble > # replicate 5 > # ala follow foldMap > ] > # map stroke > # map centerXY > # hsep 1 > # frame 0.3

Notice how the above example makes use of the ala combinator from Control.Lens to automatically wrap all the Lines using follow before combining and then unwrap the result.

FunL-systems

The new module Diagrams.TwoD.Path.LSystem in diagrams-contrib draws L-systems described by recursive string rewriting rules, and provides a number of examples that can be used as starting points for exploration.

> import Diagrams.TwoD.Path.LSystem > import qualified Data.Map as M > > tree :: RealFloat n => Int -> TurtleState n > tree n = lSystem n (1/18 @@ turn) (symbols "F") rules > where > rules = M.fromList [rule 'F' "F[+>>>F]F[->>>F][>>>F]"] > > example = getTurtleDiagram $ tree 6

This example is already provided by the module as tree2.

XKCD colors

Randall Munroe, of xkcd fame, ran a survey to determine commonly used names for colors, and published a list of the 954 most common colors based on the results. Diagrams.Color.XKCD from diagrams-contrib provides all these color names.

> import Diagrams.Color.XKCD > > colors = [booger, poisonGreen, cinnamon, petrol, vibrantPurple] > example = hsep 0.1 (zipWith fcA colors (repeat (circle 1 # lw none)))
Categories: Offsite Blogs

Michael Snoyman: Functors, Applicatives, and Monads

Planet Haskell - Mon, 01/02/2017 - 6:00pm

This content originally appeared on School of Haskell. Thanks for Julie Moronuki for encouraging me to update/republish, and for all of the edits/improvements.

NOTE Code snippets below can be run using the Stack build tool, by saving to a file Main.hs and running with stack Main.hs. More information is available in the How to Script with Stack tutorial.

Let's start off with a very simple problem. We want to let a user input his/her birth year, and tell him/her his/her age in the year 2020. Using the function read, this is really simple:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc main = do putStrLn "Please enter your birth year" year <- getLine putStrLn $ "In 2020, you will be: " ++ show (2020 - read year)

If you run that program and type in a valid year, you'll get the right result. However, what happens when you enter something invalid?

Please enter your birth year hello main.hs: Prelude.read: no parse

The problem is that the user input is coming in as a String, and read is trying to parse it into an Integer. But not all Strings are valid Integers. read is what we call a partial function, meaning that under some circumstances it will return an error instead of a valid result.

A more resilient way to write our code is to use the readMaybe function, which will return a Maybe Integer value. This makes it clear with the types themselves that the parse may succeed or fail. To test this out, try running the following code:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) main = do -- We use explicit types to tell the compiler how to try and parse the -- string. print (readMaybe "1980" :: Maybe Integer) print (readMaybe "hello" :: Maybe Integer) print (readMaybe "2000" :: Maybe Integer) print (readMaybe "two-thousand" :: Maybe Integer)

So how can we use this to solve our original problem? We need to now determine if the result of readMaybe was successful (as Just) or failed (a Nothing). One way to do this is with pattern matching:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) main = do putStrLn "Please enter your birth year" yearString <- getLine case readMaybe yearString of Nothing -> putStrLn "You provided an invalid year" Just year -> putStrLn $ "In 2020, you will be: " ++ show (2020 - year)Decoupling code

This code is a bit coupled; let's split it up to have a separate function for displaying the output to the user and another separate function for calculating the age.

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided an invalid year" Just age -> putStrLn $ "In 2020, you will be: " ++ show age yearToAge year = 2020 - year main = do putStrLn "Please enter your birth year" yearString <- getLine let maybeAge = case readMaybe yearString of Nothing -> Nothing Just year -> Just (yearToAge year) displayAge maybeAge

This code does exactly the same thing as our previous version. But the definition of maybeAge in main looks pretty repetitive to me. We check if the parse year is Nothing. If it's Nothing, we return Nothing. If it's Just, we return Just, after applying the function yearToAge. That seems like a lot of line noise to do something simple. All we want is to conditionally apply yearToAge.

Functors

Fortunately, we have a helper function to do just that. fmap, or functor mapping, will apply some function over the value contained by a functor. Maybe is one example of a functor; another common one is a list. In the case of Maybe, fmap does precisely what we described above. So we can replace our code with:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided an invalid year" Just age -> putStrLn $ "In 2020, you will be: " ++ show age yearToAge year = 2020 - year main = do putStrLn "Please enter your birth year" yearString <- getLine let maybeAge = fmap yearToAge (readMaybe yearString) displayAge maybeAge

Our code definitely got shorter, and hopefully a bit clearer as well. Now it's obvious that all we're doing is applying the yearToAge function over the contents of the Maybe value.

So what is a functor? It's some kind of container of values. In Maybe, our container holds zero or one values. With lists, we have a container for zero or more values. Some containers are even more exotic; the IO functor is actually providing an action to perform in order to retrieve a value. The only thing functors share is that they provide some fmap function which lets you modify their contents.

do-notation

We have another option as well: we can use do-notation. This is the same way we've been writing main so far. That's because- as we mentioned in the previous paragraph- IO is a functor as well. Let's see how we can change our code to not use fmap:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided an invalid year" Just age -> putStrLn $ "In 2020, you will be: " ++ show age yearToAge year = 2020 - year main = do putStrLn "Please enter your birth year" yearString <- getLine let maybeAge = do yearInteger <- readMaybe yearString return $ yearToAge yearInteger displayAge maybeAge

Inside the do-block, we have the slurp operator <-. This operator is special for do-notation and is used to pull a value out of its wrapper (in this case, Maybe). Once we've extracted the value, we can manipulate it with normal functions, like yearToAge. When we complete our do-block, we have to return a value wrapped up in that container again. That's what the return function does.

do-notation isn't available for all Functors; it's a special feature reserved only for Monads. Monads are an extension of Functors that provide a little extra power. We're not really taking advantage of any of that extra power here; we'll need to make our program more complicated to demonstrate it.

Dealing with two variables

It's kind of limiting that we have a hard-coded year to compare against. Let's fix that by allowing the user to specify the "future year." We'll start off with a simple implementation using pattern matching and then move back to do-notation.

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn $ "In that year, you will be: " ++ show age main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = case readMaybe birthYearString of Nothing -> Nothing Just birthYear -> case readMaybe futureYearString of Nothing -> Nothing Just futureYear -> Just (futureYear - birthYear) displayAge maybeAge

OK, it gets the job done... but it's very tedious. Fortunately, do-notation makes this kind of code really simple:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn $ "In that year, you will be: " ++ show age yearDiff futureYear birthYear = futureYear - birthYear main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = do birthYear <- readMaybe birthYearString futureYear <- readMaybe futureYearString return $ yearDiff futureYear birthYear displayAge maybeAge

This is very convenient: we've now slurped our two values in our do-notation. If either parse returns Nothing, then the entire do-block will return Nothing. This demonstrates an important property about Maybe: it provides short circuiting.

Without resorting to other helper functions or pattern matching, there's no way to write this code using just fmap. So we've found an example of code that requires more power than Functors provide, and Monads provide that power.

Partial application

But maybe there's something else that provides enough power to write our two-variable code without the full power of Monad. To see what this might be, let's look more carefully at our types.

We're working with two values: readMaybe birthYearString and readMaybe futureYearString. Both of these values have the type Maybe Integer. And we want to apply the function yearDiff, which has the type Integer -> Integer -> Integer.

If we go back to trying to use fmap, we'll seemingly run into a bit of a problem. The type of fmap- specialized for Maybe and Integer- is (Integer -> a) -> Maybe Integer -> Maybe a. In other words, it takes a function that takes a single argument (an Integer) and returns a value of some type a, takes a second argument of a Maybe Integer, and gives back a value of type Maybe a. But our function- yearDiff- actually takes two arguments, not one. So fmap can't be used at all, right?

Not true. This is where one of Haskell's very powerful features comes into play. Any time we have a function of two arguments, we can also look at is as a function of one argument which returns a function. We can make this more clear with parentheses:

yearDiff :: Integer -> Integer -> Integer yearDiff :: Integer -> (Integer -> Integer)

So how does that help us? We can look at the fmap function as:

fmap :: (Integer -> (Integer -> Integer)) -> Maybe Integer -> Maybe (Integer -> Integer)

Then when we apply fmap to yearDiff, we end up with:

fmap yearDiff :: Maybe Integer -> Maybe (Integer -> Integer)

That's pretty cool. We can apply this to our readMaybe futureYearString and end up with:

fmap yearDiff (readMaybe futureYearString) :: Maybe (Integer -> Integer)

That's certainly very interesting, but it doesn't help us. We need to somehow apply this value of type Maybe (Integer -> Integer) to our readMaybe birthYearString of type Maybe Integer. We can do this with do-notation:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn $ "In that year, you will be: " ++ show age yearDiff futureYear birthYear = futureYear - birthYear main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = do yearToAge <- fmap yearDiff (readMaybe futureYearString) birthYear <- readMaybe birthYearString return $ yearToAge birthYear displayAge maybeAge

We can even use fmap twice and avoid the second slurp:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn $ "In that year, you will be: " ++ show age yearDiff futureYear birthYear = futureYear - birthYear main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = do yearToAge <- fmap yearDiff (readMaybe futureYearString) fmap yearToAge (readMaybe birthYearString) displayAge maybeAge

But we don't have a way to apply our Maybe (Integer -> Integer) function to our Maybe Integer directly.

Applicative functors

And now we get to our final concept: applicative functors. The idea is simple: we want to be able to apply a function which is inside a functor to a value inside a functor. The magic operator for this is <*>. Let's see how it works in our example:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn $ "In that year, you will be: " ++ show age yearDiff futureYear birthYear = futureYear - birthYear main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = fmap yearDiff (readMaybe futureYearString) <*> readMaybe birthYearString displayAge maybeAge

In fact, the combination of fmap and <*> is so common that we have a special operator, <$>, which is a synonym for fmap. That means we can make our code just a little prettier:

let maybeAge = yearDiff <$> readMaybe futureYearString <*> readMaybe birthYearString

Notice the distinction between <$> and <*>. The former uses a function which is not wrapped in a functor, while the latter applies a function which is wrapped up.

So we don't need Monads?

So if we can do such great stuff with functors and applicative functors, why do we need monads at all? The terse answer is context sensitivity: with a monad, you can make decisions on which processing path to follow based on previous results. With applicative functors, you have to always apply the same functions.

Let's give a contrived example: if the future year is less than the birth year, we'll assume that the user just got confused and entered the values in reverse, so we'll automatically fix it by reversing the arguments to yearDiff. With do-notation and an if statement, it's easy:

#!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn $ "In that year, you will be: " ++ show age yearDiff futureYear birthYear = futureYear - birthYear main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = do futureYear <- readMaybe futureYearString birthYear <- readMaybe birthYearString return $ if futureYear < birthYear then yearDiff birthYear futureYear else yearDiff futureYear birthYear displayAge maybeAgeExercises
  1. Implement fmap using <*> and return.

    #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Control.Applicative ((<*>), Applicative) import Prelude (return, Monad) import qualified Prelude fmap :: (Applicative m, Monad m) => (a -> b) -> (m a -> m b) fmap ... ... = FIXME main = case fmap (Prelude.+ 1) (Prelude.Just 2) of Prelude.Just 3 -> Prelude.putStrLn "Good job!" _ -> Prelude.putStrLn "Try again" Show Solution myFmap function wrappedValue = return function <*> wrappedValue main = print $ myFmap (+ 1) $ Just 5
  2. How is return implemented for the Maybe monad? Try replacing return with its implementation in the code above.

    #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc returnMaybe = FIXME main | returnMaybe "Hello" == Just "Hello" = putStrLn "Correct!" | otherwise = putStrLn "Incorrect, please try again" Show Solution

    return is simply the Just constructor. This gets defined as:

    instance Monad Maybe where return = Just
  3. yearDiff is really just subtraction. Try to replace the calls to yearDiff with explicit usage of the - operator.

    #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn $ "In that year, you will be: " ++ show age main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = do futureYear <- readMaybe futureYearString birthYear <- readMaybe birthYearString return $ -- BEGIN CODE TO MODIFY if futureYear < birthYear then yearDiff birthYear futureYear else yearDiff futureYear birthYear -- END CODE TO MODIFY displayAge maybeAge Show Solution if futureYear < birthYear then birthYear - futureYear else futureYear - birthYear
  4. It's possible to write an applicative functor version of the auto-reverse-arguments code by modifying the yearDiff function. Try to do so.

    #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) import Control.Applicative ((<$>), (<*>)) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn $ "In that year, you will be: " ++ show age yearDiff futureYear birthYear = FIXME main | yearDiff 5 6 == 1 = putStrLn "Correct!" | otherwise = putStrLn "Please try again" Show Solution yearDiff futureYear birthYear | futureYear > birthYear = futureYear - birthYear | otherwise = birthYear - futureYear
  5. Now try to do it without modifying yearDiff directly, but by using a helper function which is applied to yearDiff.

    #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) import Control.Applicative ((<$>), (<*>)) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn $ "In that year, you will be: " ++ show age yearDiff futureYear birthYear = futureYear - birthYear yourHelperFunction f ... main | yourHelperFunction yearDiff 5 6 == 1 = putStrLn "Correct!" | otherwise = putStrLn "Please try again" Show Solution yourHelperFunction f x y | x > y = f x y | otherwise = f y x
Categories: Offsite Blogs

Douglas M. Auclair (geophf): December 2016 1HaskellADay Problems and Solutions

Planet Haskell - Sun, 01/01/2017 - 9:44pm
Categories: Offsite Blogs

Edward Z. Yang: A tale of backwards compatibility in ASTs

Planet Haskell - Sat, 12/31/2016 - 10:35pm

Those that espouse the value of backwards compatibility often claim that backwards compatibility is simply a matter of never removing things. But anyone who has published APIs that involve data structures know that the story is not so simple. I'd like to describe my thought process on a recent BC problem I'm grappling with on the Cabal file format. As usual, I'm always interested in any insights and comments you might have.

The status quo. The build-depends field in a Cabal file is used to declare dependencies on other packages. The format is a comma-separated list of package name and version constraints, e.g., base >= 4.2 && < 4.3. Abstractly, we represent this as a list of Dependency:

data Dependency = Dependency PackageName VersionRange

The effect of an entry in build-depends is twofold: first, it specifies a version constraint which a dependency solver takes into account when picking a version of the package; second, it brings the modules of that package into scope, so that they can be used.

The extension. We added support for "internal libraries" in Cabal, which allow you to specify multiple libraries in a single package. For example, suppose you're writing a library, but there are some internal functions that you want to expose to your test suite but not the general public. You can place these functions in an internal library, which is depended upon by both the public library and the test suite, but not available to external packages.

For more motivation, see the original feature request, but for the purpose of this blog post, we're interested in the question of how to specify a dependency on one of these internal libraries.

Attempt #1: Keep the old syntax. My first idea for a new syntax for internal libraries was to keep the syntax of build-depends unchanged. To refer to an internal library named foo, you simply write build-depends: foo; an internal library shadows any external package with the same name.

Backwards compatible? Absolutely not. Remember that the original interpretation of entries in build-depends is of package names and version ranges. So if you had code that assumed that there actually is an external package for each entry in build-depends would choke in an unexpected way when a dependency on an internal library was specified. This is exactly what happened with cabal-install's dependency solver, which needed to be updated to filter out dependencies that corresponded to internal libraries.

One might argue that it is acceptable for old code to break if the new feature is used. But there is a larger, philosophical objection to overloading package names in this way: don't call something a package name if it... isn't actually a package name!

Attempt #2: A new syntax. Motivated by this philosophical concern, as well as the problem that you couldn't simultaneously refer to an internal library named foo and an external package named foo, we introduce a new syntactic form: to refer to the internal library foo in the package pkg, we write build-depends: pkg:foo.

Since there's a new syntactic form, our internal AST also has to change to handle this new form. The obvious thing to do is introduce a new type of dependency:

data BuildDependency = BuildDependency PackageName (Maybe UnqualComponentName) VersionRange

and say that the contents of build-depends is a list of BuildDependency.

When it comes to changes to data representation, this is a "best-case scenario", because we can easily write a function BuildDependency -> Dependency. So supposing our data structure for describing library build information looked something like this:

data BuildInfo = BuildInfo { targetBuildDepends :: [Dependency], -- other fields }

We can preserve backwards compatibility by turning targetBuildDepends into a function that reads out the new, extend field, and converts it to the old form:

data BuildInfo = BuildInfo { targetBuildDepends2 :: [BuildDependency], -- other fields } targetBuildDepends :: BuildDependency -> BuildInfo targetBuildDepends = map buildDependencyToDependency . targetBuildDepends2

Critically, this takes advantage of the fact that record selectors in Haskell look like functions, so we can replace a selector with a function without affecting downstream code.

Unfortunately, this is not actually true. Haskell also supports record update, which lets a user overwrite a field as follows: bi { targetBuildDepends = new_deps }. If we look at Hackage, there are actually a dozen or so uses of targetBuildDepends in this way. So, if we want to uphold backwards-compatibility, we can't delete this field. And unfortunately, Haskell doesn't support overloading the meaning of record update (perhaps the lesson to be learned here is that you should never export record selectors: export some lenses instead).

It is possible that, in balance, breaking a dozen packages is a fair price to pay for a change like this. But let's suppose that we are dead-set on maintaining BC.

Attempt #3: Keep both fields. One simple way to keep the old code working is to just keep both fields:

data BuildInfo = BuildInfo { targetBuildDepends :: [Dependency], targetBuildDepends2 :: [BuildDependency], -- other fields }

We introduce a new invariant, which is that targetBuildDepends bi == map buildDependencyToDependency (targetBuildDepends2 bi). See the problem? Any legacy code which updates targetBuildDepends probably won't know to update targetBuildDepends2, breaking the invariant and probably resulting in some very confusing bugs. Ugh.

Attempt #4: Do some math. The problem with the representation above is that it is redundant, which meant that we had to add invariants to "reduce" the space of acceptable values under the type. Generally, we like types which are "tight", so that, as Yaron Minsky puts it, we "make illegal states unrepresentable."

To think a little more carefully about the problem, let's cast it into a mathematical form. We have an Old type (isomorphic to [(PN, VR)]) and a New type (isomorphic to [(PN, Maybe CN, VR)]). Old is a subspace of New, so we have a well-known injection inj :: Old -> New.

When a user updates targetBuildDepends, they apply a function f :: Old -> Old. In making our systems backwards compatible, we implicitly define a new function g :: New -> New, which is an extension of f (i.e., inj . f == g . inj): this function tells us what the semantics of a legacy update in the new system is. Once we have this function, we then seek a decomposition of New into (Old, T), such that applying f to the first component of (Old, T) gives you a new value which is equivalent to the result of having applied g to New.

Because in Haskell, f is an opaque function, we can't actually implement many "common-sense" extensions. For example, we might want it to be the case that if f updates all occurrences of parsec with parsec-new, the corresponding g does the same update. But there is no way to distinguish between an f that updates, and an f that deletes the dependency on parsec, and then adds a new dependency on parsec-new. (In the bidirectional programming world, this is the distinction between state-based and operation-based approaches.)

We really only can do something reasonable if f only ever adds dependencies; in this case, we might write something like this:

data BuildInfo = BuildInfo { targetBuildDepends :: [Dependency], targetSubLibDepends :: [(PackageName, UnqualComponentName)], targetExcludeLibDepends :: [PackageName], -- other fields }

The conversion from this to BuildDependency goes something like:

  1. For each Dependency pn vr in targetBuildDepends, if the package name is not mentioned in targetExcludeLibDepends, we have BuildDependency pn Nothing vr.
  2. For each (pn, cn) in targetSubLibDepends where there is a Dependency pn vr (the package names are matching), we have BuildDependency pn (Just cn) vr.

Stepping back for a moment, is this really the code we want to write? If the modification is not monotonic, we'll get into trouble; if someone reads out targetBuildDepends and then writes it into a fresh BuildInfo, we'll get into trouble. Is it really reasonable to go to these lengths to achieve such a small, error-prone slice of backwards compatibility?

Conclusions. I'm still not exactly sure what approach I'm going to take to handle this particular extension, but there seem to be a few lessons:

  1. Records are bad for backwards compatibility, because there is no way to overload a record update with a custom new update. Lenses for updates would be better.
  2. Record update is bad for backwards compatibility, because it puts us into the realm of bidirectional programming, requiring us to reflect updates from the old world into the new world. If our records are read-only, life is much easier. On the other hand, if someone ever designs a programming language that is explicitly thinking about backwards compatibility, bidirectional programming better be in your toolbox.
  3. Backwards compatibility may be worse in the cure. Would you rather your software break at compile time because, yes, you really do have to think about this new case, or would you rather everything keep compiling, but break in subtle ways if the new functionality is ever used?

What's your take? I won't claim to be a expert on questions of backwards compatibility, and would love to see you weigh in, whether it is about which approach I should take, or general thoughts about the interaction of programming languages with backwards compatibility.

Categories: Offsite Blogs

Sandy Maguire: Comonadic Collision Resolution

Planet Haskell - Sat, 12/31/2016 - 6:00pm
<article> <header> Comonadic Collision Resolution </header>

<time>January 1, 2017</time> haskell, comonads, physics, game

I have a sinful, guilty pleasure – I like a sports video-game: NBA Jam Tournament Edition. Regrettably, I don’t own a copy, and all of my attempts to acquire one have ended in remarkable misfortune.

Obviously my only recourse was to make a tribute game that I could play to my heart’s desire.

And so that’s what I set out to do, back in 2013. My jam-loving then-coworker and I drafted up the barest constituent of a design document, and with little more thought about the whole thing we dove right in.

We “decided” on Python as our language of choice, and off we went. There was no game engine, so we rolled everything by hand: drawing, collision, you name it. We got a little demo together, and while it was definitely basketball-like, it certainly wasn’t a game. Eventually my partner lost interest, and the code sits mostly forgotten in the back recesses of my Github repositories.

I say mostly forgotten because over the last three years, I’ve occasionally spent a sleepless night here or there working on it, slowly but surely turning midnight fuel into reality.

Three years is a long time to spend on a toy project, and it’s an even longer amount of time for a junior engineer’s sensibilities to stay constant. As I learned more and more computer science tools, I found myself waging a constant battle against Python. The details aren’t important, but it was consistently a headache in order to get the language to allow me to express the things I wanted to. It got to the point where I stopped work entirely on the project due to it no longer being fun.

But this basketball video-game of mine was too important to fail, and so I came up with a solution.

If you’re reading this blog, you probably already know what the solution to my problem was – I decided to port the game over to Haskell. Remarkable progress was made: within a few days I had the vast majority of it ported. At first my process looked a lot like this:

  1. Read a few lines of Python.
  2. Try to understand what they were doing.
  3. Copy them line-by-line into Haskell syntax.

and this worked well enough. If there were obvious improvements that could be made, I would do them, but for the most part, it was blind and mechanical. At time of writing I have a bunch of magical constants in my codebase that I dare not change.

However, when I got to the collision resolution code, I couldn’t in good conscience port the code. It was egregious, and would have been an abomination upon all that is good and holy if that imperative mess made it into my glorious new Haskell project.

The old algorithm was like so:

  1. Attempt to move the capsule1 to the desired location.
  2. If it doesn’t intersect with any other capsules,
Categories: Offsite Blogs

FP Complete: Software project maintenance is where Haskell shines

Planet Haskell - Fri, 12/30/2016 - 2:00pm
<html>We Spend Most of Our Time on Maintenance

Look at the budget spent on your software projects. Most of it goes towards maintenance. The Mythical Man-Month by Fred Brooks states that over 90% of the costs of a typical system arise in the maintenance phase, and that any successful piece of software will inevitably be maintained, and Facts and Fallacies of Software Engineering by Robert L. Glass reports that maintenance typically consumes 40% to 80% (averaging 60%) of software costs.

From our own experience and the literature, we can conclude that maintenance is perhaps the most important part of developing software. In this article we'll explore why Haskell shines in maintenance.

The Five Bases of Maintenance

Based on the article Software Maintenance by Chris Newton, I'm going to write about five bases for doing software maintenance:

  • Readability: The source code is comprehensible, and describes the domain well to the reader.
  • Testability: The code is friendly to being tested, via unit tests, integration tests, property tests, code review, static analysis, etc.
  • Preservation of knowledge: Teams working on the software retain the knowledge of the design and functioning of the system over time.
  • Modifiability: The ease with which we can fix, update, refactor, adapt, and generally mechanically change.
  • Correctness: The software is constructed in a self-consistent way, by using means of combination that rule out erroneous cases that maintainers shouldn't have to deal with.

We'll see below what Haskell brings to the table for each of these bases.

Readability

The source code is comprehensible, and describes the domain well to the reader.

Reduce state: Developers must hold a "state of the world" in their head when understanding imperative and object-oriented source code. In Haskell, which is a pure functional language, developers only have to look at the inputs to a function, making it far easier to consider a portion of code and to approach working on it.

Narrowing the problem space: A rich type system like Haskell's guides less-experienced developers, or newcomers to the project, to the right places. Because the domain can be modeled in types, which formally narrow down the problem. Developers can literally define problems away, turning their attention to the real problems of your business's domain.

Coupling where it counts: Haskell's type system supports modeling cases of a problem, coupling the case (such as: logged in/logged out) with the values associated with that state (such as: user session id/no session id). Developers can work with fewer variables to hold in their head, instead concentrating on your business logic.

Encapsulation: Like in object oriented languages (Java, C++, Ruby, Python), encapsulation in Haskell allows developers to hide irrelevant details when exposing the interfaces between modules, leaving other developers fewer details to worry about.

Testability

The code is friendly to being tested, via unit tests, integration tests, property tests, code review, static analysis, etc.

Explicit inputs: Haskell programs are the easiest to write tests for, because they are composed of pure functions, which either require no conditions under which your developers should run them, or the conditions are explicitly defined inputs to the function.

Mock the world: With excellent support for embedded domain-specific languages (DSLs), Haskell empowers developers to write programs in an imperative fashion which can then be interpreted as a real world program (interacting with file I/O, using time, etc.) or as a mock program which does nothing to the real world but compute a result. This is valuable for testing the business logic of the software without having to setup a whole real environment just to do so.

Automatically test properties: Haskell's unique type system supports trivially generating thousands of valid inputs to a function, in order to test that every output of the function is correct. Anything from parsers, financial calculations, state machine transformations, etc. can be generated and tested for.

Static analysis: It may go without saying, but Haskell's static type system brings substantial potential for eliminating whole classes of bugs, and maintaining invariants while changing software, as a continuous feedback to the developer. A level-up from Java or C++ or C#, Haskell's purity and rich type system is able to check a far greater region of source code and to greater precision.

Taking testing seriously: Haskell has a large number of testing libraries which range from standard unit testing (like JUnit or RSpec), web framework-based testing, property-based testing (like QuickCheck) and other randomly generated testing, testing documentation, concurrency testing, and mock testing.

Preservation of knowledge

Teams working on the software retain the knowledge of the design and functioning of the system over time.

Model the domain precisely: Because Haskell's rich type system lets your developers model the domain precisely and in a complete way, it's easier for the same developers to return months or a year from now, or new developers to arrive, and gain a good grasp of what's happening in the system.

Modifiability

The ease with which we can fix, update, refactor, adapt, and generally mechanically change.

Automatic memory management: Haskell is high-level with automatically managed memory, like Python or Ruby, and does not suffer from memory corruption issues or leaks, like C or C++, which can arise from developers making changes to your system and mistakenly mismanaging memory manually.

Automate completeness: As mentioned in the readability section, Haskell allows developers to define data types as a set of cases that model the business domain logic. From simple things like results (success/fail/other), to finite state machines, etc. Along with this comes the ability for the compiler to statically determine and tell your developers when a case is missing, which they need go to and correct. This is extraordinarily useful when changing and extending a system.

Break up the problem: Haskell's pure functions only depend on their parameters, and so any expression can be easily factored out into separate functions. Breaking a problem down into smaller problems helps maintainers deal with smaller problems, taking fewer things into account.

Encapsulate: As encapsulation allows developers to hide irrelevant details when exposing the interfaces between Haskell modules, this allows developers to change the underlying implementation of modules without consumers of that module having to be changed.

Decouple orthogonal concepts: In Haskell, unlike in popular object oriented languages like Java or C++, data and behavior are not coupled together: a photograph is a photograph, and a printer knows how to print it, it's not that a photograph contains printing inside it. The data is the photograph, and the behavior is printing a photograph. In Haskell, these two are decoupled, allowing developers to simply define the data that counts, and freely add more behaviors later, without getting lost in object hierarchies and inheritance issues.

Correctness

The software is constructed in a self-consistent way, by using means of combination that rule out erroneous cases that maintainers shouldn't have to deal with.

Correct combination: In Python, a whole new version of the language, Python 3, had to be implemented to properly handle Unicode text in a backwards-incompatible way. This broke lots of existing Python code and many large projects have still not upgraded. In Haskell, text and binary data are unmixable data types. They cannot be mistakenly combined, as in Python and many other languages. This throws a whole class of encoding issues out of the window, which is less for your developers to worry about.

No implicit null: In The Billion Dollar Mistake Tony Hoare apologizes for the "null" value, present in almost all popular programming languages. Haskell does not have a null value. It explicitly models nullability with a data type. Given the countless bugs caused by null, and maintenance burden due to tracking down or introducing such bugs, Haskell's contribution by removing it is substantial. Languages that include an implicit null value are: Java, C, C++, C#, Python, Ruby, JavaScript, Lisp, Clojure, etc.

Avoid multiple writers: In concurrent code, developers have to be very careful when more than one thread changes the same data. Imperative languages tend to allow any thread to change anything, so it's frighteningly easy to make mistakes. In Haskell, data structures are immutable, and a mutable "box" has to be created to share data between threads, ruling out a plethora of potential bugs.

Summary

Maintenance is our biggest activity when developing successful software. There are five bases that really make maintenance work better, and this is where Haskell really shines:

  • Readability: Haskell's purity and type system lend themselves perfectly to comprehensible code.
  • Testability: Haskell code is inherently more testable, due to being pure, safely statically typed, and coming with a variety of testing packages.
  • Preservation of knowledge: A rich type system like Haskell's can model the domain so well that developers have to remember less, and educate each-other less, saving time.
  • Modifiability: Haskell's strong types, completeness analysis and purity assure that when you break something, you know it sooner.
  • Correctness: Developers can work within a consistent model of your domain, removing whole classes of irrelevant problems. Concurrent code is easier to maintain too.

All in all, Haskell really shines in maintenance, and, while it has other novel features, it's really for this reason that developers and companies are increasingly switching to it.

You can learn more about using Haskell as a business at FP Complete's home page, in particular the Consulting page, or go and contact us straight away and we'll be in touch.

</html>
Categories: Offsite Blogs

Roman Cheplyaka: optparse-applicative quick start

Planet Haskell - Fri, 12/30/2016 - 2:00pm

When I need to write a command-line program in Haskell, I invariably pick Paolo Capriotti’s optparse-applicative library.

Unfortunately, the minimal working example is complicated enough that I cannot reproduce it from memory, and the example in the README is very different from the style I prefer.

So I decided to put up a template here for a program using optparse-applicative. I am going to copy it into all of my future projects, and you are welcome to do so, too.

import Options.Applicative import Control.Monad (join) main :: IO () main = join . execParser $ info (helper <*> parser) ( fullDesc <> header "General program title/description" <> progDesc "What does this thing do?" ) where parser :: Parser (IO ()) parser = work <$> strOption ( long "string_param" <> short 's' <> metavar "STRING" <> help "string parameter" ) <*> option auto ( long "number_param" <> short 'n' <> metavar "NUMBER" <> help "number parameter" <> value 1 <> showDefault ) work :: String -> Int -> IO () work _ _ = return ()
Categories: Offsite Blogs

Edward Z. Yang: Backpack and the PVP

Planet Haskell - Fri, 12/30/2016 - 12:32am

In the PVP, you increment the minor version number if you add functions to a module, and the major version number if you remove function to a module. Intuitively, this is because adding functions is a backwards compatible change, while removing functions is a breaking change; to put it more formally, if the new interface is a subtype of the older interface, then only a minor version number bump is necessary.

Backpack adds a new complication to the mix: signatures. What should the PVP policy for adding/removing functions from signatures should be? If we interpret a package with required signatures as a function, theory tells us the answer: signatures are contravariant, so adding required functions is breaking (bump the major version), whereas it is removing required functions that is backwards-compatible (bump the minor version).

However, that's not the end of the story. Signatures can be reused, in the sense that a package can define a signature, and then another package reuse that signature:

unit sigs where signature A where x :: Bool unit p where dependency sigs[A=<A>] module B where import A z = x

In the example above, we've placed a signature in the sigs unit, which p uses by declaring a dependency on sigs. B has access to all the declarations defined by the A in sigs.

But there is something very odd here: if sigs were to ever remove its declaration for x, p would break (x would no longer be in scope). In this case, the PVP rule from above is incorrect: p must always declare an exact version bound on sigs, as any addition or deletion would be a breaking change.

So we are in this odd situation:

  1. If we include a dependency with a signature, and we never use any of the declarations from that signature, we can specify a loose version bound on the dependency, allowing for it to remove declarations from the signature (making the signature easier to fulfill).
  2. However, if we ever import the signature and use anything from it, we must specify an exact bound, since removals are now breaking changes.

I don't think end users of Backpack should be expected to get this right on their own, so GHC (in this proposed patchset) tries to help users out by attaching warnings like this to declarations that come solely from packages that may have been specified with loose bounds:

foo.bkp:9:11: warning: [-Wdeprecations] In the use of ‘x’ (imported from A): "Inherited requirements from non-signature libraries (libraries with modules) should not be used, as this mode of use is not compatible with PVP-style version bounds. Instead, copy the declaration to the local hsig file or move the signature to a library of its own and add that library as a dependency."

Of course, GHC knows nothing about bounds, so the heuristic we use is that a package is a signature package with exact bounds if it does not expose any modules. A package like this is only ever useful by importing its signatures, so we never warn about this case. We conservatively assume that packages that do expose modules might be subject to PVP-style bounds, so we warn in that case, e.g., as in:

unit q where signature A where x :: Bool module M where -- Module! unit p where dependency q[A=<A>] module B where import A z = x

As the warning suggests, this error can be fixed by explicitly specifying x :: Bool inside p, so that, even if q removes its requirement, no code will break:

unit q where signature A where x :: Bool module M where -- Module! unit p where dependency q[A=<A>] signature A where x :: Bool module B where import A z = x

Or by putting the signature in a new library of its own (as was the case in the original example.)

This solution isn't perfect, as there are still ways you can end up depending on inherited signatures in PVP-incompatible ways. The most obvious is with regards to types. In the code below, we rely on the fact that the signature from q forces T to be type equal to Bool:

unit q where signature A where type T = Bool x :: T module Q where unit p where dependency q[A=<A>] signature A where data T x :: T module P where import A y = x :: Bool

In principle, it should be permissible for q to relax its requirement on T, allowing it to be implemented as anything (and not just a synonym of Bool), but that change will break the usage of x in P. Unfortunately, there isn't any easy way to warn in this case.

A perhaps more principled approach would be to ban use of signature imports that come from non-signature packages. However, in my opinion, this complicates the Backpack model for not a very good reason (after all, some day we'll augment version numbers with signatures and it will be glorious, right?)

To summarize. If you want to reuse signatures from signature package, specify an exact version bound on that package. If you use a component that is parametrized over signatures, do not import and use declarations from those signatures; GHC will warn you if you do so.

Categories: Offsite Blogs

Ivan Lazar Miljenovic: Transmogrify your data!

Planet Haskell - Thu, 12/29/2016 - 4:24am

Have you ever wanted to do something like this?

λ> cons 'a' (1::Int, 2::Word, 3::Double) :: (Char, Int, Word, Double) ('a',1,2,3.0)

Or how about this?

λ> unsnoc ('a',1::Int,2::Word,3.0::Double) :: ((Char, Int, Word), Double) (('a',1,2),3.0)

Let me try to completely confuse you (and potentially give a hint as to what I’m doing):

λ> transmogrify ('H', 'a', 's', 'k', 'e', 'l', 'l') :: ((Char, Char), Char, (Char, Char, (Char, Char))) (('H','a'),'s',('k','e',('l','l')))

One more hint:

λ> data Foo = Bar Char Char Char deriving (Show, Generic) λ> transmogrify ('a', 'b', 'c') :: Foo Bar 'a' 'b' 'c' You read that right

TRANSMOGRIFICATION!!!

What do you mean by that?

I’ve suddenly become really interested in GHC Generics, and it occurred to me the other day that – since it basically decomposes more interesting types to products and sums with lots of associated metadata – that it should be possible to get two different types that are fundamentally the same shape but lots of different pesky metadata.

Turns out, it is possible. I’ve got a prototype of a little library that implements this on GitHub, and that’s what I used for those examples above.

How it works

Basically, all metadata (constructor names, record aliases, strictness annotations, etc.) is stripped out. This is done recursively throughout the entire type, stopping at fundamental types like Int and Char. To cap it all off, products and converted from a tree-like implementation into an explicit list (this is even done recursively for any products contained within products, like the nested tuples above).

When will this be on Hackage?

I doubt it will be.

The approach is a bit hacky, with various type-classes required including type aliases, etc. That’s not too bad, but there is pretty much no type safety or inference available (hence all the explicit annotations above).

The performance is also not great: it’s fundamentally O(n), and there’s no way to really fix this (at least that I can see).

There are also currently two limitations with the implementation:

  1. No handling of sum-types. This could be remedied by basically copying and modifying the existing handling of product types.
  2. An explicit list of types is needed to be able to stop type recursion; this is currently limited to numeric types and Char.

This second limitation is the biggest fundamental problem with how to get this to a production-ready library. Ideally you could specify “this type should not be examined”. Even better: if a component type doesn’t have a Generic instance then don’t bother trying to split it apart.

So, now what?

Well, the code is there. If there’s enough interest I might try and clean it up and put it on Hackage regardless.

But if you think this will somehow solve all your problems, then maybe you should re-think what you’re doing

Categories: Offsite Blogs

Functional Jobs: Senior Backend Engineer at Euclid Analytics (Full-time)

Planet Haskell - Tue, 12/27/2016 - 5:23pm

We are looking to add a senior individual contributor to the backend engineering team! Our team is responsible for creating and maintaining the infrastructure that powers the Euclid Analytics Engine. We leverage a forward thinking and progressive stack built in Scala and Python, with an infrastructure that uses Mesos, Spark and Kafka. As a senior engineer you will build out our next generation ETL pipeline. You will need to use and build tools to interact with our massive data set in as close to real time as possible. If you have previous experience with functional programming and distributed data processing tools such as Spark and Hadoop, then you would make a great fit for this role!

Responsibilities:

  • Partnering with the data science team to architect and build Euclid’s big data pipeline
  • Building tools and services to maintain a robust, scalable data service layer
  • Leverage technologies such as Spark and Kafka to grow our predictive analytics and machine learning capabilities in real time
  • Finding innovative solutions to performance issues and bottlenecks
  • Help build and scale our internal and external Python APIs

Requirements:

  • At least 3 years industry experience in a full time role utilizing Scala or other modern functional programming languages (Haskell, Clojure, Lisp, etc.)
  • Database management experience (MySQL, Redis, Cassandra, Redshift, MemSQL)
  • Experience with big data infrastructure including Spark, Mesos, Scalding and Hadoop
  • Excited about data flow and orchestration with tools like Kafka and Spark Streaming
  • Have experience building production deployments using Amazon Web Services or Heroku’s Cloud Application Platform
  • B.S. or equivalent in Computer Science or another technical field

Get information on how to apply for this position.

Categories: Offsite Blogs

mightybyte: On Haskell Documentation

Planet Haskell - Mon, 12/26/2016 - 10:02am
The following started out as a response to a Hacker News comment, but got long enough to merit a standalone blog post.

I think the root of the Haskell documentation debate lies in a pretty fundamental difference in how you go about finding, reading, and understanding documentation in Haskell compared to mainstream languages.  Just last week I ran into a situation that really highlighted this difference.

I was working on creating a Haskell wrapper around the ACE editor.  I initially wrote the wrapper some time ago and got it integrated into a small app.  Last week I needed ACE integration in another app I'm working on and came back to the code.  But I ran into a problem...ACE automatically makes AJAX requests for JS files needed for pluggable syntax highlighters and themes.  But it was making the AJAX requests in the wrong place and I needed to tell it to request them from somewhere else.  Depending on how interested you are in this, you might try looking through the ACE documentation on your own before reading on to see if you can find the answer to this problem.

When you go to the ACE home page, the most obvious place to start seems to be the embedding guide.  This kind of guide seems to be what people are talking about when they complain about Haskell's documentation.  But this guide gave me no clues as to how to solve my problem.  The embedding guide then refers you to the how-to guide.  That documentation didn't help me either.  The next place I go is the API reference.  I'm no stranger to API references.  This is exactly what I'm used to from Haskell!  I look at the docs for the top-level Ace module.  There are only three functions here.  None of them what I want.  They do have some type signatures that seem to help a little, but it doesn't tell me the type of the `edit` function, which is the one that seems most likely to be what I want.  At this point I'm dying for a hyperlink to the actual code, but there are none to be found.  To make a long story short, the thing I want is nowhere to be found in the API reference either.

I only solved the problem when a co-worker who has done a lot of JS work found the answer buried in a closed GitHub issue.  There's even a comment on that issue by someone saying he had been looking for it "for days".  The solution was to call ace.config.set('basePath', myPath);.  This illustrates the biggest problem with tutorial/how-to documentation: they're always incomplete.  There will always be use cases that the tutorials didn't think of.  They also take effort to maintain, and can easily get out of sync over time.

I found this whole experience with ACE documentation very frustrating, and came away feeling that I vastly prefer Haskell documentation.  With Haskell, API docs literally give you all the information needed to solve any problem that the API can solve (with very few exceptions).  If the ACE editor was written natively in Haskell, the basePath solution would be pretty much guaranteed to show up somewhere in the API documentation.  In my ACE wrapper you can find it here.

Now I need to be very clear that I am not saying the types are enough and saying the state of Haskell documentation is good enough.  There are definitely plenty of situations where it is not at all obvious how to wrangle the types to accomplish what you want to accomplish.  Haskell definitely needs to improve its documentation.  But this takes time and effort.  The Haskell community is growing but still relatively small, and resources are limited.  Haskell programmers should keep in mind that newcomers will probably be more used to tutorials and how-tos.  And I think newcomers should keep in mind that API docs in Haskell tell you a lot more than in other languages, and be willing to put some effort into learning how to use these resources effectively.
Categories: Offsite Blogs

Ketil Malde: The cost of fixing climate change

Planet Haskell - Sat, 12/24/2016 - 5:00am

We need to find altnerative sources of energy, to replace our dependency on fossil fuels. There is a lot of talk about solar, about how economically profitable it has become, how all this free renewable energy is waiting to be harvested, and how the prices keep plummeting.

Being, I hope, of a somewhat economic mindset, I don't believe in the existence of hundred dollar bills on the pavement. The large solar plants I can find numbers for (e.g., Topaz) seen to cost about USD 2.4 billion to deliver slightly more than 1 TWh/year. And solar panels drop in cost, but obviously a construction site many square miles in size is going to be expensive, no matter the cost of materials.

Olkiluoto nuclear power station (photo from Wikimedia by Hannu Huovila)

Another issue with solar - recent price drops seem to be as much caused by moving production to the Far East. This means leveraging cheaper labor, but perhaps even more so, leveraging cheap, subsidized power, mostly from coal. Currently, I suspect the solar industry consumes more power than it produces, meaning it currently accellerates climate change. (This will change when growth rates drop, but at 0.2% of our global energy coming from solar, this could be some years off)

And large installations in the Californian desert is one thing, but up north where I live? There aren't many real installations, and thus not much in the way of numbers. There is a Swedish installation in Västerås claiming it will be able to produce power at about the same cost as Topaz. This would indeed be remarkable. Neuhardenberg in Germany cost €300 million (15% of Topaz), but only produces 20 GWh (2%), which seems remarkably low again. I find it hard to trust these numbers. Another comparison: the UK invests around US$5 billion per year in photovoltaics, and generated electricity seems to rise 2-3 TWh. This points to $2/kWh/year, a Topaz-level ROI, which is actually rather impressive for a rainy country far to the north.

All of this ignores necessary infrastructure; we need a way to store energy from sunny days, and in the north, from the warm summer to freezing winters, where energy use for heating quadruples my electric bill. Yes, we can pump water uphill for hydroelectric backup, but this has both a construction cost, an efficiencly loss in the pumps and turbines, and, I think, an opportunity cost, since if you have the opportunity of building a hydroelectric plant, you might consider just doing that, and generate clean power without an expensive solar installation. The alternative backup power for those who aren't blessed with rainy, tall mountains, seems to be natural gas, which of course is a fossil fuel, and where the quick single-cycle plants are considerably less efficient.

There was an interesting, but perhaps overly pessimistic paper by Ferroni and Hopkirk calculating that taking everything into account, solar panels at lattitudes like northern Europe would never return the energy invested in them. Others disagree, and in the end, it is mostly about what to include in "everything".

If you look at many total cost of power estimates, nuclear is often comparable to solar. I find this difficult to swallow. Again looking at actual construction costs, Olkiluoto may cost as much as €8 billion. But the contract price was €3 billion, and it is unclear to me if it was the contractor who messed up the estimates, or the later construction process - so it is hard to say what it would cost to build the next one. And in any case, the reactor will produce 15 TWh/year, so you get maybe thirteen Topaz'es worth of energy at two to four times the investment.

I think solar and nuclear make for a good comparison, both technologies are extremely expensive to build, but are then quite cheap to run, and for nuclear, much of the variation in total cost estimates depend on discounting rates, and decommissioning cost. Financing costs should be the same for solar, and although nobody talks about decommissioning, Topaz is 25km² of solar panels that must be disposed of, that's not going to be free either. We can also consider that reactors can run for maybe fifty to seventy years, solar panels are expected to last for 25 to 30.

In the end, both solar and nuclear can supply energy, but nuclear appears to be a lot cheaper.

* * *

If you look at what's happening in the world, the answer is: not much. Sure, we're building a couple of solar power plants, a few reactors, a handful of windmills. Is it simply too expensive to replace fossils?

I did some calculations. Using Olkiluoto as a baseline, a country like Poland could replace its 150TWh coal-fired electricity production for €30-80 billion. A lot of money, to be sure, but not outrageously so. (Of course, coal-fired electricity is just a fraction of total fossil use, think of transportation - but it's a start, and it's the low hanging fruit).

Another comparision: on the 'net, I find US total consumption on oil to be equivalent to something like 35 quadrillion BTUs. My calculations make this out to be 10 000 TWh, and somewhat fewer than 700 reactors would (if we assume we get better at this, and that after building the first hundred or so, we can do it for USD 3 billion, about the original price tag), something over two trillion dollars.

Which happens to be the current estimate for the cost of the Iraq war. Isn't it ironic? In order to attempt to secure a supply of oil (and by the way, how did that work out for you?), the US probably spent the same amount of money it would have taken to eliminate the dependence on oil, entirely and permanently. And which, as a byproduct, would have avoided thousands of deaths, several civil wars, millions of refugees, and -- almost forgot: helped to stop global warming. (The war in Afghanistan seems to have been slightly cheaper, but that was a war to eradicate extremism and terror, and although it seems hard to believe, it was even less successful than the war in Iraq. I can only look forward to our coming intervention in Syria. But I digress.)

* * *

Back to climate change. It isn't sufficient to produce enough alternative energy, of course, what we want, is to stop the production of fossil fuels. In a global and free market economy, oil (like any other product) will be produced as long as it is profitable to do so. In other words, we must supply the alternative energy, not to satisfy our current consumption, but to drive the price of energy low enough that oil extraction no longer turns a profit.

Some oil fields are incredibly inexpensive to run, and it's highly unlikely that the price will ever drop so low that Saudis are going to stop scooping oil out of the dunes. But with recent oil prices above $100/barrel, many newer fields are expensive. Tar sands in Canada, fracking and shale oil, and recent exploration in the Arctic - I suspect these projects are not profitable, and they certainly have a high degree of financial risk.

Yet, people go ahead, and we just got some new licenses issued for the Barents sea. I'm not an analyst, but its difficult for me to imagine that these will ever be profitable, and yet they go ahead. But similar to solar and nuclear, oil extraction requires a huge initial investment, and when the field is in production, keeping it producing will stil be profitable. In Norway, the oil business is a mixture of government and private sector initiatives, and my bet is that companies are racing to get the investments in place. And either the public sector guarantees for the investment, or the companies gamble on later bailouts - in either case, the people involved get to keep their jobs.

Categories: Offsite Blogs