News aggregator

Ian Ross: Non-diffusive atmospheric flow #6: principal components analysis

Planet Haskell - Thu, 09/18/2014 - 7:39am
Non-diffusive atmospheric flow #6: principal components analysis September 18, 2014

The pre-processing that we’ve done hasn’t really got us anywhere in terms of the main analysis we want to do – it’s just organised the data a little and removed the main source of variability (the seasonal cycle) that we’re not interested in. Although we’ve subsetted the original geopotential height data both spatially and temporally, there is still a lot of data: 66 years of 181-day winters, each day of which has 72×15 Z500 values. This is a very common situation to find yourself in if you’re dealing with climate, meteorological, oceanographic or remote sensing data. One approach to this glut of data is something called dimensionality reduction, a term that refers to a range of techniques for extracting “interesting” or “important” patterns from data so that we can then talk about the data in terms of how strong these patterns are instead of what data values we have at each point in space and time.

I’ve put the words “interesting” and “important” in quotes here because what’s interesting or important is up to us to define, and determines the dimensionality reduction method we use. Here, we’re going to side-step the question of determining what’s interesting or important by using the de facto default dimensionality reduction method, principal components analysis (PCA). We’ll take a look in detail at what kind of “interesting” and “important” PCA give us a little later.

PCA is, in principle, quite a simple method, but it causes many people endless problems. There are some very good reasons for this:

  • PCA is in some sense nothing more than a generic change of basis operation (with the basis we change to chosen in a special way). The result of this is that a lot of the terminology used about PCA is also very generic, and hence very confusing (words like “basis”, “component”, “eigenvector”, “projection” and so on could mean more or less anything in this context!).

  • PCA is used in nearly every field where multivariate data is analysed, and is the archetypical “unsupervised learning” method. This means that it has been invented, reinvented, discovered and rediscovered many times, under many different names. Some other names for it are: empirical orthogonal function (EOF) analysis, the Karhunen-Loève decomposition, proper orthogonal decomposition (POD), and there are many others. Each of these different fields also uses different terms for the different outputs from PCA. This is very confusing: some people talk about principal components, some about empirical orthogonal functions and principal component time series, some about basis functions, and so on. Here, we’re going to try to be very clear and careful about the names that we use for things to try to alleviate some of the confusion.

  • There is a bit of a conceptual leap that’s necessary to go from very basic examples of using PCA to using PCA to analyse the kind of spatio-temporal data we have here. I used to say something like: “Well, there’s a nice two-dimensional example, and it works just the same in 100 dimensions, so let’s just apply it to our atmospheric data!” A perfectly reasonable reponse to that is: “WHAT?! Are you an idiot?”. Here, we’re going to take that conceptual leap slowly, and describe exactly how the “change of basis” view of PCA works for spatio-temporal data.

  • There are some aspects of the scaling of the different outputs from PCA that are really confusing. In simple terms, PCA breaks your data down into two parts, and you could choose to put the units of your data on either one of those parts, normalising the other part. Which one you put the units on isn’t always an obvious choice and it’s really easy to screw things up if you do it wrong. We’ll look at this carefully here.

So, there’s quite a bit to cover in the next couple of articles. In this article, we will: explain the basic idea of PCA with a very simple (two-dimensional!) example; give a recipe for how to perform PCA on a data set; talk about why PCA works from an algebraic standpoint; talk about how to do these calculations in Haskell. Then in the next article, we will: describe exactly how we do PCA on spatio-temporal data; demonstrate how to perform PCA on the Z500 anomaly data; show how to visualise the Z500 PCA results and save them for later use. What we will end up with from this stage of our analysis is a set of “important” spatial patterns (we’ll see what “important” means for PCA) and time series of how strong each of those spatial patterns is at a particular point in time. The clever thing about this decomposition is that we can restrict our attention to the few most “important” patterns and discard all the rest of the variability in the data. That makes the subsequent exploration of the data much simpler.

The basic idea of PCA

We’re going to take our first look at PCA using a very simple example. It might not be immediately obvious how the technique we’re going to develop here will be applicable to the spatio-temporal Z500 data we really want to analyse, but we’ll get to that a little later, after we’ve seen how PCA works in this simple example and we’ve done a little algebra to get a clearer understanding of just why the “recipe” we’re going to use works the way that it does.

Suppose we go to the seaside and measure the shells of mussels1. We’ll measure the length and width of each shell and record the data for each mussel as a two-dimensional (length, width) vector. There will be variation in the sizes and shapes of the mussels, some longer, some shorter, some fatter, some skinnier. We might end up with data that looks something like what’s shown below, where there’s a spread of length in the shells around a mean of about 5 cm, a spread in the width of shells around a mean of about 3 cm, and there’s a clear correlation between shell length and width (see Figure 1 below). Just from eyeballing this picture, it seems apparent that maybe measuring shell length and width might not be the best way to represent this data – it looks as though it could be better to think of some combination of length and width as measuring the overall “size” of a mussel, and some other combination of length and width as measuring the “fatness” or “skinniness” of a mussel. We’ll see how a principal components analysis of this data extracts these two combinations in a clear way.

The code for this post is available in a Gist. The Gist contains a Cabal file as well as the Haskell source, to make it easy to build. Just do something like this to build and run the code in a sandbox:

git clone pca-2d cd pca-2d cabal sandbox init cabal install ./.cabal-sandbox/bin/pca-2d

Just for a slight change, I’m going to produce all the plots in this section using Haskell, specifically using the Chart library. We’ll use the hmatrix library for linear algebra, so the imports we end up needing are:

import Control.Monad import Numeric.LinearAlgebra.HMatrix import Graphics.Rendering.Chart.Easy hiding (Matrix, Vector, (|>), scale) import Graphics.Rendering.Chart.Backend.Cairo

There are some name overlaps between the monadic plot interface provided by the Graphics.Rendering.Chart.Easy module and hmatrix, so we just hide the overlapping ones.

We generate 500 synthetic data points:

-- Number of test data points. n :: Int n = 500 -- Mean, standard deviation and correlation for two dimensions of test -- data. meanx, meany, sdx, sdy, rho :: Double meanx = 5.0 ; meany = 3.0 ; sdx = 1.2 ; sdy = 0.6 ; rho = 0.75 -- Generate test data. generateTestData :: Matrix Double generateTestData = let seed = 1023 mean = 2 |> [ meanx, meany ] cov = matrix 2 [ sdx^2 , rho*sdx*sdy , rho*sdx*sdy , sdy^2 ] samples = gaussianSample seed n mean cov in fromRows $ filter ((> 0) . minElement) $ toRows samples

The mussel shell length and width values are generated from a two-dimensional Gaussian distribution, where we specify mean and standard deviation for both shell length and width, and the correlation between the length and width (as the usual Pearson correlation coefficient). Given this information, we can generate samples from the Gaussian distribution using hmatrix’s gaussianSample function. (If we didn’t have this function, we would calculate the Cholesky decomposition of the covariance matrix we wanted, generate samples from a pair of standard one-dimensional Gaussian distributions and multiple two-dimensional vectors of these samples by one of the Cholesky factors of the covariance matrix – this is just what the gaussianSample function does for us.) We do a little filtering in generateTestData to make sure that we don’t generate any negative values2.

The main program that drives the generation of the plots we’ll look at below is:

main :: IO () main = do let dat = generateTestData (varexp, evecs, projs) = pca dat (mean, cov) = meanCov dat cdat = fromRows $ map (subtract mean) $ toRows dat forM_ [(PNG, "png"), (PDF, "pdf"), (SVG, "svg")] $ \(ptype, suffix) -> do doPlot ptype suffix dat evecs projs 0 doPlot ptype suffix cdat evecs projs 1 doPlot ptype suffix cdat evecs projs 2 doPlot ptype suffix cdat evecs projs 3 putStrLn $ "FRACTIONAL VARIANCE EXPLAINED: " ++ show varexp

and you can see the doPlot function that generates the individual plots in the Gist. I won’t say a great deal about the plotting code, except to observe that the new monadic API to the Chart library makes generating this kind of simple plot in Haskell no harder than it would be using Gnuplot or something similar. The plot code produces one of four plots depending on an integer parameter, which ranges from zero (the first plot above) to three. Because we’re using the Cairo backend to the Chart library, we can generate image output in any of the formats that Cairo supports – here we generate PDF (to insert into LaTeX documents), SVG (to insert into web pages) and PNG (for a quick look while we’re playing with the code).

The main program above is pretty simple: generate test data, do the PCA calculation (by calling the pca function, which we’ll look at in detail in a minute), do a little bit of data transformation to help with plotting, then call the doPlot function for each of the plots we want. Here are the plots we produce, which we’ll refer to below as we work through the PCA calculation:

Synthetic mussel shell test data for two-dimensional PCA example.

Centred synthetic mussel shell test data for two-dimensional PCA example.

PCA eigenvectors for two-dimensional PCA example.

Data projection onto PCA eigenvectors for two-dimensional PCA example.

Let’s now run through the “recipe” for performing PCA, looking at the figures above in parallel with the code for the pca function:

1 2 3 4 5 6 7 8 9 10 11 pca :: Matrix Double -> (Vector Double, Matrix Double, Matrix Double) pca xs = (varexp, evecs, projs) where (mean, cov) = meanCov xs (_, evals, evecCols) = svd cov evecs = fromRows $ map evecSigns $ toColumns evecCols evecSigns ev = let maxelidx = maxIndex $ cmap abs ev sign = signum (ev ! maxelidx) in cmap (sign *) ev varexp = scale (1.0 / sumElements evals) evals projs = fromRows $ map project $ toRows xs project x = evecs #> (x - mean)

We’ll look at just why this recipe works in the next section, but for the moment, let’s just see what happens:

  1. We start with our original mussel shell data (Figure 1 above).

  2. We calculate the mean and covariance of our data (line 3 of the pca function listing). PCA analyses the deviations of our data from the mean, so we effectively look at “centred” data, as shown in Figure 2, where we’ve just removed the mean from each coordinate in our data. The mean and covariance calculation is conveniently done using hmatrix’s meanCov function.

  3. Then we calculate the eigendecomposition of the covariance matrix. Because the covariance matrix is a real symmetric matrix, by construction, we know that the eigenvectors will form a complete set that we can use as a basis to represent our data. (We’re going to blithely ignore all questions of possible degeneracy here – for real data, “almost surely” means always!) Here, we do the eigendecomposition using a singular value decomposition (line 4 in the listing of the pca function). The singular values give us the eigenvalues and the right singular vectors give us the eigenvectors. The choice here to use SVD (via hmatrix’s svd function) rather than some other means of calculating an eigendecomposition is based primarily on the perhaps slightly prejudiced idea that SVD has the best and most stable implementations – here, hmatrix calls out to LAPACK to do this sort of thing, so there’s probably not much to choose, since the other eigendecomposition implementations in LAPACK are also good, but my prejudice in favour of SVD remains! If you want some better justification for why SVD is “the” matrix eigendecomposition, take a look at this very interesting historical review of the development of SVD: G. W. Stewart (1993). On the early history of the singular-value decomposition. SIAM Rev. 35(4), 551-566.

  4. We do a little manipulation of the directions of the eigenvectors (lines 5-8 in the listing), flipping the signs of them to make the largest components point in the positive direction – this is mostly just to make the eigenvectors look good for plotting. The eigenvectors are shown in the Figure 3: we’ll call them e1 (the one pointing to the upper right) and e2 (the one pointing to the upper left). Note that these are unit vectors. We’ll talk about this again when we look at using PCA for spatio-temporal data.

  5. Once we have unit eigenvectors, we can project our (centred) data points onto these eigenvectors (lines 10 and 11 of the listing: the project function centres a data point by taking off the mean, then projects onto each eigenvector using hmatrix’s matrix-vector product operator #>). Figure 4 shows in schematic form how this works – we pick out one data point in green and draw lines parallel and orthogonal to the eigenvectors showing how we project the data point onto the eigenvectors. Doing this for each data point is effectively just a change of basis: instead of representing our centred data value by measurements along the x- and y-axes, we represent it by measurements in the directions of e1 and e2. We’ll talk more about this below as well.

  6. Finally, the eigenvalues from the eigendecomposition of the covariance matrix tell us something about how much of the total variance in our input data is “explained” by the projections onto each of the eigenvectors. I’ve put the word “explained” in quotes because I don’t think it’s a very good word to use, but it’s what everyone says. Really, we’re just saying how much of the data variance lies in the direction of each eigenvector. Just as you can calculate the variance of the mussel length and width individually, you can calculate the variance of the projections onto the eigenvectors. The eigenvalues from the PCA eigendecomposition tell you how much variance there is in each direction, and we calculate the “fraction of variance explained” for each eigenvector and return it from the pca function.

So, the pca function returns three things: eigenvalues (actually fractional explained variance calculated from the eigenvalues) and eigenvectors from the PCA eigendecomposition, plus projections of each of the (centred) data points onto each of the eigenvectors. The terminology for all these different things is very variable between different fields. We’re going to sidestep the question of what these things are called by always explicitly referring to PCA eigenvectors (or, later on when we’re dealing with spatio-temporal data, PCA eigenpatterns), PCA explained variance fractions and PCA projected components. These terms are a bit awkward, but there’s no chance of getting confused this way. We could choose terminology from one of the fields where PCA is commonly used, but that could be confusing for people working in other fields, since the terminology in a lot of cases is not very well chosen.

Together, the PCA eigenvectors and PCA projected components constitute nothing more than a change of orthonormal basis for representing our input data – the PCA output contains exactly the same information as the input data. (Remember that the PCA eigenvectors are returned as unit vectors from the pca function, so we really are just looking at a simple change of basis.) So it may seem as though we haven’t really done anything much interesting with our data. The interesting thing comes from the fact that we can order the PCA eigenvectors in decreasing order of the explained variance fraction. If we find that data projected onto the first three (say) eigenvectors explains 80% of the total variance in our data, then we may be justified in considering only those three components. In this way, PCA can be used as a dimensionality reduction method, allowing us to use low-dimensional data analysis and visualisation techniques to deal with input data that has high dimensionality.

This is exactly what we’re going to do with the Z500 data: we’re going to perform PCA, and take only the leading PCA eigenvectors and components, throwing some information away. The way that PCA works guarantees that the set of orthogonal patterns we keep are the “best” patterns in terms of explaining the variance in our data. We’ll have more to say about this in the next section when we look at why our centre/calculate covariance/eigendecomposition recipe works.

The algebra of PCA

In the last section, we presented a “recipe” for PCA (at least for two-dimensional data): centre the data; calculate the covariance matrix; calculate the eigendecomposition of the covariance matrix; project your centred data points onto the eigenvectors. The eigenvalues give you a measure of the proportion of the variance in your data in the direction of the corresponding eigenvector. And the projection of the data points onto the PCA eigenvectors is just a change of basis, from whatever original basis your data was measured in (mussel shell length and width as the two components of each data point in the example) to a basis with the PCA eigenvectors as basis vectors.

So why does this work? Obviously, you can use whatever basis you like to describe your data, but why is the PCA eigenbasis useful and interesting? I’ll explain this quite quickly, since it’s mostly fairly basic linear algebra, and you can read about it in more detail in more or less any linear algebra textbook3.

To start with, let’s review some facts about eigenvectors and eigenvalues. For a matrix A, an eigenvector u and its associated eigenvalue λ satisfy


The first thing to note is that any scalar multiple of u is also an eigenvector, so an eigenvector really refers to a “direction”, not to a specific vector with a fixed magnitude. If we multiply both sides of this by uT and rearrange a little, we get


The denominator of the fraction on the right hand side is just the length of the vector u. Now, we can find the largest eigenvalue λ1 and corresponding eigenvector u1 by solving the optimisation problem

u1=arg maxuTu=1uTAu,

where for convenience, we’ve restricted the optimisation to find a unit eigenvector, and we find λ1 directly from the fact that Au1=λ1u1.

We can find next largest (in magnitude) eigenvalue and corresponding eigenvector of the matrix A by projecting the rows of A into the subspace orthogonal to u1 to give a new matrix A1 and solving the optimisation problem

u2=arg maxuTu=1uTA1u,

finding the second largest eigenvalue λ2 from A1u2=λ2u2. Further eigenvectors and eigenvalues can be found in order of decreasing eigenvalue magnitude by projecting into subspaces orthogonal to all the eigenvectors found so far and solving further optimisation problems.

This link between this type of optimisation problem and the eigenvectors and eigenvalues of a matrix is the key to understanding why PCA works the way that it does. Suppose that we have centred our (K-dimensional) data, and that we call the N centred data vectors xi, i=1,2,…N. If we now construct an N×K matrix X whose rows are the xi, then the sample covariance of the data is


Now, given a direction represented as a unit vector u, we can calculate the data variance in that direction as ∣∣Xu∣∣2, so that if we want to know the direction in which the data has the greatest variance, we solve an optimisation problem of the form

u1=arg maxuTu=1(Xu)TXu=arg maxuTu=1uTCu.

But this optimisation problem is just the eigendecomposition optimisation problem for the covariance matrix C. This demonstrates we can find the directions of maximum variance in our data by looking at the eigendecomposition of the covariance matrix C in decreasing order of eigenvalue magnitude.

There are a couple of things to add to this. First, the covariance matrix is, by construction, a real symmetric matrix, so its eigenvectors form a complete basis – this means that we really can perform a change of basis from our original data to the PCA basis with no loss of information. Second, because the eigenvectors of the covariance matrix are orthogonal, the projections of our data items onto the eigenvector directions (what we’re going to call the PCA projected components) are uncorrelated. We’ll see some consequences of this when we look at performing PCA on the Z500 data. Finally, and related to this point, it’s worth noting that PCA is a linear operation – the projected components are linearly uncorrelated, but that doesn’t mean that there can’t be some nonlinear relationship between them. There are generalisations of PCA to deal with this case, but we won’t be talking about them for the purposes of this analysis.

Everything we’ve done here is pretty straightforward, but you might be wondering why we would want to change to this PCA basis at all? What’s the point? As I noted above, but is worth reiterating, the most common use for PCA, and the way that we’re going to use it with the Z500 data, is as a dimensionality reduction method. For the Z500 data, we have, for each day we’re looking at, 72×15=1080 spatial points, which is a lot of data to look at and analyse. What we usually do is to perform PCA, then ignore all but the first few leading PCA eigenvectors and projected components. Because of the way the optimisation problems described above are set up, we can guarantee that the leading m PCA eigenvectors span the m-dimensional subspace of the original data space containing the most data variance, and we can thus convince ourselves that we aren’t missing interesting features of our data by taking only those leading components. We’ll see how this works in some detail when we do the PCA analysis of the Z500 data, but in the mussel measurement case, this would correspond to thinking just of the projection of the mussel length and width data along the leading e1 eigendirection, so reducing the measurements to a single “size” parameter that neglects the variation in fatness or skinniness of the mussels. (This two-dimensional case is a bit artificial. Things will make more sense when we look at the 1080-dimensional case for the Z500 data.)

  1. Well, not really, since I live in the mountains of Austria and there aren’t too many mussels around here, so I’ll generate some synthetic data!

  2. Obviously, a Gaussian distribution is not right for quantities like lengths that are known to be positive, but here we’re just generating some data for illustrative purposes, so we don’t care all that much. If we were trying to model this kind of data though, we’d have to be more careful.

  3. I like Gilbert Strang’s Linear Algebra and its Applications, although I’ve heard from some people that they think it’s a bit hard for a first textbook on the subject – if you’ve had any exposure to this stuff before though, it’s good.

data-analysis haskell <script src="" type="text/javascript"></script> <script type="text/javascript"> (function () { var articleId = fyre.conv.load.makeArticleId(null); fyre.conv.load({}, [{ el: 'livefyre-comments', network: "", siteId: "290329", articleId: articleId, signed: false, collectionMeta: { articleId: articleId, url: fyre.conv.load.makeCollectionUrl(), } }], function() {}); }()); </script>
Categories: Offsite Blogs

cryptography and homomorphic library support?

haskell-cafe - Thu, 09/18/2014 - 5:38am
Hi, I just perused the crypto section of Hackage and I didn't see any support for homomorphc functions. I didn't see anything .. maybe I missed. If no support, I wouldn't mind giving a shot at working on this as long as no one has already implemented. If someone is working on this, please inform me of contact info so perhaps I can join the team. Kind regards, Vasya
Categories: Offsite Discussion

Haskell Weekly News: Issue 306

General haskell list - Thu, 09/18/2014 - 5:28am
Welcome to issue 306 of the HWN, an issue covering crowd-sourced bits of information about Haskell from around the web. This issue covers from September 07 to 13, 2014 Quotes of the Week * prophile: at this point ekmett is more part of the language than a practitioner of it * Qfwfq: Simon is a title, not a forename. Top Reddit Stories * Haskell for all: Morte: an intermediate language for super-optimizing functional programs Domain:, Score: 110, Comments: 83 Original: [1] On Reddit: [2] * AMP has landed in GHC Main! Domain:, Score: 88, Comments: 71 Original: [3] On Reddit: [4] * [Meta] What's with all these downvotes on beginner questions? Domain: self.haskell, Score: 69, Comments: 67 Original: [5] On Reddit: [6] * Haskell Implementors Workshop 2014 videos [youtube] Doma
Categories: Incoming News

optional ghc-options based on GHC version?

haskell-cafe - Thu, 09/18/2014 - 3:38am
Hi, I have a project here where GHC's option -flate-dmd-anal gets out some extra speed but the flag is not available on 7.6.3. How can I specify in the cabal file that I only want to pass this through when building with 7.8.3 or newer? It'd be a shame to not take advantage of the extra speed on 7.8.3 just because we want to support 7.6.3 too for a bit longer. I was unable to find the info online. Thanks.
Categories: Offsite Discussion

Boolean formula: part2

haskell-cafe - Wed, 09/17/2014 - 8:48pm
Hello! I have following code: buildFormulas :: [Variable] -> Int -> [Formula] buildFormulas vars 0 = map Var vars buildFormulas vars n = join $ forM [0 .. n - 1 ] $ \i -> do x <- buildFormulas vars i y <- buildFormulas vars (n - i - 1) z <- buildFormulas vars (n - 1) let t1 = case z of (Negation _) -> [] _ -> [Negation z] t1 ++ [conj x y, impl x y, disj x y] It is correct, typechecks, but it is insanely slow (calculating take 35 (buildFormulas [P, Q, R] 5) hangs my computer). My guess is that it builds too much before returning first results. Can you suggest more efficent way enumerate formulas of set of variables? -- Best regards, Dmitry Bogatov <KAction< at >>, Free Software supporter, esperantisto and netiquette guardian. GPG: 54B7F00D _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >
Categories: Offsite Discussion

DSL implementation

haskell-cafe - Wed, 09/17/2014 - 7:02pm
HI all, as a part of a project for physical simulation, I'm was given the task to help with the implementation of an Small DSL that is a minimal functional language. Point is that this small language contains some dependent typed theory, because we want to be able to check statically the shapes of vector and matrices, among other things. Haskell of course seems the tool for the job. But my boss insists that the implementation should be done in C, but I think that it is madness. Can someone please point me to relevant literature, discussions or general information that can help me to show my boss that Haskell is the right tool? any help would be appreciated. Felipe Z. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >
Categories: Offsite Discussion

Keegan McAllister: Raw system calls for Rust

Planet Haskell - Wed, 09/17/2014 - 6:33pm

I wrote a small library for making raw system calls from Rust programs. It provides a macro that expands into in-line system call instructions, with no run-time dependencies at all. Here's an example:


#[phase(plugin, link)]
extern crate syscall;

fn write(fd: uint, buf: &[u8]) {
unsafe {
syscall!(WRITE, fd, buf.as_ptr(), buf.len());

fn main() {
write(1, "Hello, world!\n".as_bytes());

Right now it only supports x86-64 Linux, but I'd love to add other platforms. Pull requests are much appreciated. :)

Categories: Offsite Blogs

Constructors on left and right hand sides of equation

haskell-cafe - Wed, 09/17/2014 - 3:03pm
Dear list In Eq 1 is (NewGlass i) on the LHS distinct from (NewGlass i) on the RHS? In Eq2 are the occurrences of the data constructor Fill on the LHS and RHS distinct? Thanks Pat data Glass = NewGlass Int | Fill Glass Int deriving Show drink :: Glass -> Int -> Glass drink (NewGlass i) j = (NewGlass i) -- Eq 1 drink (Fill m i) j | full (Fill m i) = Fill (NewGlass (size m)) ((size m) - j) -- Eq 2 | (i >= j) = Fill m (i-j) | otherwise = drink m (j-1) size :: Glass -> Int size (NewGlass i) = i size (Fill m i) = size m level :: Glass -> Int level (NewGlass i) = 0 level (Fill m i) | (level m) + i > (size m) = size m | otherwise = (level m) + i empty :: Glass -> Bool empty m = (level m) == 0 full :: Glass -> Bool full m = (level m) == (size m) {- drink (Fill (NewGlass 4) 5) 5 drink (Fill (NewGlass 4) 3) 3 -}
Categories: Offsite Discussion

I have created... dumb sort.

Haskell on Reddit - Wed, 09/17/2014 - 2:47pm
atBeginning xs x = uncurry (++) $ partition (==x) xs dumbSort xs = foldl atBeginning xs [maximum xs, maximum xs - 1..minimum xs]

Efficient on lists of small numbers!

submitted by Undo_all
[link] [2 comments]
Categories: Incoming News

Philip Wadler: The military and the referendum.

Planet Haskell - Wed, 09/17/2014 - 2:42pm

Many readers will have heard about Lord Dannat in the Telegraph arguing a vote for independence will dishonour Scotland's war dead.

Perhaps not as many will have heard that Jimmy Sinclair (the oldest surviving Desert Rat, aged 102), Colin May (Lieutenant Commander, Faslane), and sixteen others have written a letter slamming Dannat; at least, I didn't hear until this morning. "How dare he take their sacrifice in vain and try to turn it to political advantage?"

Both sides are reported by the BBC, though the headline mentions only one. (More #bbcbias?)

Categories: Offsite Blogs

Eric Kidd: Deploying Rust applications to Heroku, with example code for Iron

Planet Haskell - Wed, 09/17/2014 - 2:35pm

Now with support for Iron, Cargo and Cargo.lock!

You can deploy an example Rust application to Heroku using this button:

If you'd prefer to use the command line, you'll need git and the Heroku toolbelt. Once these are installed, run:

git clone cd heroku-rust-cargo-hello heroku create --buildpack git push heroku master

This will download the example application, create a new Heroku project, and deploy the code to Heroku. That's it!

How it works

Our server is based on the Iron middleware library. We parse URL parameters and dispatch HTTP requests to the appropriate handler routine using Iron's router module:

fn main() { let mut router = Router::new(); router.get("/", hello); router.get("/:name", hello_name); Iron::new(router).listen(Ipv4Addr(0, 0, 0, 0), get_server_port()); }

The hello function returns an HTTP status code and the content to send to the user:

fn hello(_: &mut Request) -> IronResult<Response> { Ok(Response::with(status::Ok, "Hello world!")) }

The hello_name function is similar, but we look up the value of the :name parameter that we declared to the router, above.

fn hello_name(req: &mut Request) -> IronResult<Response> { let params = req.extensions.find::<Router,Params>().unwrap(); let name = params.find("name").unwrap(); Ok(Response::with(status::Ok, format!("Hello, {}!", name))) }

The final piece needed to run on Heroku is a function to look up up our port number:

fn get_server_port() -> Port { getenv("PORT") .and_then(|s| from_str::<Port>(s.as_slice())) .unwrap_or(8080) }

The full source code is available on GitHub. To learn more about Rust, see the excellent Rust guide.

Keep reading for notes on building the program locally and on configuring your own programs to run on Heroku.

Read more…

Categories: Offsite Blogs

Philip Wadler: The case for Europe

Planet Haskell - Wed, 09/17/2014 - 2:10pm
<iframe frameborder="0" height="500" src=";title=Today%3A%2017%2F09%2F2014%3A%20Salmond%20promises%20%22common-sense%20agreement%20on%20a%20common%20currency%E2%80%9D&amp;product=iplayer" width="400"></iframe>
Readers of this list will know that I don't always see eye-to-eye with Alex Salmond. Nonetheless, I think he put the case for Europe well this morning on the Today Program.In a BBC interview just the other night, a spanish minister being
interviewed by Kirsty Wark despite being invited three or four times
refused to say that Spain would attempt to veto Scottish
membership. And the reason for that of course is that the Spanish
government have already said that in the circumstance of a consented
democratic referendum, as they put it, Spain would have nothing to say
about it.

We can go through legal opinion and expert opinion as much as we like.
I think the answer is in four figures: 1, 20, 25, and 60.

1% is the percentage of Scotland's population compared to the European Union.

20% is the percentage of the fish stocks of the entire European Union.

25% is the percentage of the renewable energy of the entire European Union offshore.

And 60% is the oil reserves that Scotland has.

Anyone who believes that a country [with these resources] is not
going to be welcome in the wider Europe doesn't understand the process
by which Europe accepts democratic results and that Scotland has a
huge amount of attractiveness to the rest of the European continent.
You can hear the original here, the relevant segment starts at around 8:00.
Categories: Offsite Blogs

Fairly small projects (~1 to 7 days) to do and learn from?

Haskell on Reddit - Wed, 09/17/2014 - 12:39pm

I recently started using haskell and it's quickly become my favorite language. Unfortuantly, I have nothing to apply it to. What are some small projects I can do for fun and to practice my skills?

submitted by Undo_all
[link] [18 comments]
Categories: Incoming News

Injective type families for GHC - syntax proposals

haskell-cafe - Wed, 09/17/2014 - 12:10pm
Haskellers, there is some discussion going on at the moment about implementing injective type families in GHC [1]. I'd like to hear your opinions on proposed syntax for this new feature. == Problem == Currently GHC assumes that type families are not injective. This is of course conservative and the idea behind implementing injective type families is to allow the programmer to explicitly declare that a type family has the injectivity property (which GHC will of course verify). == Forms of injectivity == Injectivity can take forms more complicated than just determining the LHS from RHS: A. RHS determines all arguments on the LHS. Example: type family Id a where Id a = a Here the RHS uniquely determines LHS. B. RHS determines some but not all arguments on the LHS. Example (contrieved): type family G a b c where G Int Char Bool = Bool G Int Char Int = Bool G Bool Int Int = Int Here RHS uniquely determines `a` and `b` but not `c`. C. RHS and some LHS ar
Categories: Offsite Discussion