News aggregator

add IsList instance to Data.Sequence

libraries list - Wed, 07/16/2014 - 9:36pm
seems like an oversight in ghc 7.8 release of containers, but worth adding! I'm willing to write the patch etc if everyone ok's this discussion period: 2 weeks _______________________________________________ Libraries mailing list Libraries< at >
Categories: Offsite Discussion

Mark Jason Dominus: Guess what this does (solution)

Planet Haskell - Wed, 07/16/2014 - 6:00pm

A few weeks ago I asked people to predict, without trying it first, what this would print:

perl -le 'print(two + two == five ? "true" : "false")'

(If you haven't seen this yet, I recommend that you guess, and then test your guess, before reading the rest of this article.)

People familiar with Perl guess that it will print true; that is what I guessed. The reasoning is as follows: Perl is willing to treat the unquoted strings two and five as strings, as if they had been quoted, and is also happy to use the + and == operators on them, converting the strings to numbers in its usual way. If the strings had looked like "2" and "5" Perl would have treated them as 2 and 5, but as they don't look like decimal numerals, Perl interprets them as zeroes. (Perl wants to issue a warning about this, but the warning is not enabled by default. Since the two and five are treated as zeroes, the result of the == comparison are true, and the string "true" should be selected and printed.

So far this is a little bit odd, but not excessively odd; it's the sort of thing you expect from programming languages, all of which more or less suck. For example, Python's behavior, although different, is about equally peculiar. Although Python does require that the strings two and five be quoted, it is happy to do its own peculiar thing with "two" + "two" == "five", which happens to be false: in Python the + operator is overloaded and has completely different behaviors on strings and numbers, so that while in Perl "2" + "2" is the number 4, in Python is it is the string 22, and "two" + "two" yields the string "twotwo". Had the program above actually printed true, as I expected it would, or even false, I would not have found it remarkable.

However, this is not what the program does do. The explanation of two paragraphs earlier is totally wrong. Instead, the program prints nothing, and the reason is incredibly convoluted and bizarre.

First, you must know that print has an optional first argument. (I have plans for an article about how optional first argmuents are almost always a bad move, but contrary to my usual practice I will not insert it here.) In Perl, the print function can be invoked in two ways:

print HANDLE $a, $b, $c, …; print $a, $b, $c, …;

The former prints out the list $a, $b, $c, … to the filehandle HANDLE; the latter uses the default handle, which typically points at the terminal. How does Perl decide which of these forms is being used? Specifically, in the second form, how does it know that $a is one of the items to be printed, rather than a variable containing the filehandle to print to?

The answer to this question is further complicated by the fact that the HANDLE in the first form could be either an unquoted string, which is the name of the handle to print to, or it could be a variable containing a filehandle value. Both of these prints should do the same thing:

my $handle = \*STDERR; print STDERR $a, $b, $c; print $handle $a, $b, $c;

Perl's method to decide whether a particular print uses an explicit or the default handle is a somewhat complicated heuristic. The basic rule is that the filehandle, if present, can be distinguished because its trailing comma is omitted. But if the filehandle were allowed to be the result of an arbitrary expression, it might be difficult for the parser to decide where there was a a comma; consider the hypothetical expression:

print $a += EXPRESSION, $b $c, $d, $e;

Here the intention is that the $a += EXPRESSION, $b expression calculates the filehandle value (which is actually retrieved from $b, the $a += … part being executed only for its side effect) and the remaining $c, $d, $e are the values to be printed. To allow this sort of thing would be way too confusing to both Perl and to the programmer. So there is the further rule that the filehandle expression, if present, must be short, either a simple scalar variable such as $fh, or a bare unqoted string that is in the right format for a filehandle name, such as `HANDLE``. Then the parser need only peek ahead a token or two to see if there is an upcoming comma.

So for example, in

print STDERR $a, $b, $c;

the print is immediately followed by STDERR, which could be a filehandle name, and STDERR is not followed by a comma, so STDERR is taken to be the name of the output handle. And in

print $x, $a, $b, $c;

the print is immediately followed by the simple scalar value $x, but this $x is followed by a comma, so is considered one of the things to be printed, and the target of the print is the default output handle.


print STDERR, $a, $b, $c;

Perl has a puzzle: STDERR looks like a filehandle, but it is followed by a comma. This is a compile-time error; Perl complains “No comma allowed after filehandle” and aborts. If you want to print the literal string STDERR, you must quote it, and if you want to print A, B, and C to the standard error handle, you must omit the first comma.

Now we return the the original example.

perl -le 'print(two + two == five ? "true" : "false")'

Here Perl sees the unquoted string two which could be a filehandle name, and which is not followed by a comma. So it takes the first two to be the output handle name. Then it evaluates the expression

+ two == five ? "true" : "false"

and obtains the value true. (The leading + is a unary plus operator, which is a no-op. The bare two and five are taken to be string constants, which, compared with the numeric == operator, are considered to be numerically zero, eliciting the same warning that I mentioned earlier that I had not enabled. Thus the comparison Perl actually does is is 0 == 0, which is true, and the resulting string is true.)

This value, the string true, is then printed to the filehandle named two. Had we previously opened such a filehandle, say with

open two, ">", "output-file";

then the output would have been sent to the filehandle as usual. Printing to a non-open filehandle elicits an optional warning from Perl, but as I mentioned, I have not enabled warnings, so the print silently fails, yielding a false value.

Had I enabled those optional warnings, we would have seen a plethora of them:

Unquoted string "two" may clash with future reserved word at -e line 1. Unquoted string "two" may clash with future reserved word at -e line 1. Unquoted string "five" may clash with future reserved word at -e line 1. Name "main::two" used only once: possible typo at -e line 1. Argument "five" isn't numeric in numeric eq (==) at -e line 1. Argument "two" isn't numeric in numeric eq (==) at -e line 1. print() on unopened filehandle two at -e line 1.

(The first four are compile-time warnings; the last three are issued at execution time.) The crucial warning is the one at the end, advising us that the output of print was directed to the filehandle two which was never opened for output.

[ Addendum 20140718: I keep thinking of the following remark of Edsger W. Dijkstra:

[This phenomenon] takes one of two different forms: one programmer places a one-line program on the desk of another and … says, "Guess what it does!" From this observation we must conclude that this language as a tool is an open invitation for clever tricks; and while exactly this may be the explanation for some of its appeal, viz., to those who like to show how clever they are, I am sorry, but I must regard this as one of the most damning things that can be said about a programming language.

But my intent is different than what Dijkstra describes. His programmer is proud, but I am discgusted. Incidentally, I believe that Dijkstra was discussing APL here. ]

Categories: Offsite Blogs

Ian Ross: Haskell data analysis: Reading NetCDF files

Planet Haskell - Wed, 07/16/2014 - 3:09pm
Haskell data analysis: Reading NetCDF files July 16, 2014

I never really intended the FFT stuff to go on for as long as it did, since that sort of thing wasn’t really what I was planning as the focus for this Data Analysis in Haskell series. The FFT was intended primarily as a “warm-up” exercise. After fourteen blog articles and about 10,000 words, everyone ought to be sufficiently warmed up now…

Instead of trying to lay out any kind of fundamental principles for data analysis before we get going, I’m just going to dive into a real example. I’ll talk about generalities as we go along when we have some context in which to place them.

All of the analysis described in this next series of articles closely follows that in the paper: D. T. Crommelin (2004). Observed nondiffusive dynamics in large-scale atmospheric flow. J. Atmos. Sci. 61(19), 2384–2396. We’re going to replicate most of the data analysis and visualisation from this paper, maybe adding a few interesting extras towards the end.

It’s going to take a couple of articles to lay out some of the background to this problem, but I want to start here with something very practical and not specific to this particular problem. We’re going to look at how to gain access to meteorological and climate data stored in the NetCDF file format from Haskell. This will be useful not only for the low-frequency atmospheric variability problem we’re going to look at, but for other things in the future too.

The NetCDF file format

The NetCDF file format is a “self-describing” binary format that’s used a lot for storing atmospheric and oceanographic data. It’s “self-describing” in the sense that the file format contains metadata describing the spatial and temporal dimensions of variables, as well as optional information about units and a bunch of other stuff. It’s a slightly intimidating format to deal with at first, but we’ll only need to know how a subset of it works. (And it’s much easier to deal with than HDF5, which we’ll probably get around to when we look at some remote sensing data at some point.)

So, here’s the 30-second introduction to NetCDF. A NetCDF file contains dimensions, variables and attributes. A NetCDF dimension just has a name and a size. One dimension can be specified as an “unlimited” or record dimension, which is usually used for time series, and just means that you can tack more records on the end of the file. A NetCDF variable has a name, a type, a list of dimensions, some attributes and some data. As well as attributes attached to variables, a NetCDF file can also have some file-level global attributes. A NetCDF attribute has a name, a type and a value. And that’s basically it (for NetCDF-3, at least; NetCDF-4 is a different beast, but I’ve never seen a NetCDF-4 file in the wild, so I don’t worry about it too much).

An example NetCDF file

That’s very abstract, so let’s look at a real example. The listing below shows the output from the ncdump tool for one of the data files we’re going to be using, which stores a variable called geopotential height (I’ll explain exactly what this is in a later article – for the moment, it’s enough to know that it’s related to atmospheric pressure). The ncdump tool is useful for getting a quick look at what’s in a NetCDF file – it shows all the dimension and variable definitions, all attributes and also dumps the entire data contents of the file as ASCII (which you usually want to chop off…).

netcdf z500-1 { dimensions: longitude = 144 ; latitude = 73 ; time = 7670 ; variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:0.0" ; time:long_name = "time" ; short z500(time, latitude, longitude) ; z500:scale_factor = 0.251043963537454 ; z500:add_offset = 50893.8041655182 ; z500:_FillValue = -32767s ; z500:missing_value = -32767s ; z500:units = "m**2 s**-2" ; z500:long_name = "Geopotential" ; z500:standard_name = "geopotential" ; // global attributes: :Conventions = "CF-1.0" ; :history = "Sun Feb 9 18:46:25 2014: ncrename -v z,z500\n", "2014-01-29 21:04:31 GMT by grib_to_netcdf-1.12.0: grib_to_netcdf /data/soa/scra tch/ -o /data/soa/scratch/netcdf-web237-20140129210411-3022" ; data: longitude = 0, 2.5, 5, 7.5, 10, 12.5, 15, 17.5, 20, 22.5, 25, 27.5, 30, 32.5, 35, 37.5, 40, 42.5, 45, 47.5, 50, 52.5, 55, 57.5, 60, 62.5, 65, 67.5, 70, 72.5, 75, 77.5, 80, 82.5, 85, 87.5, 90, 92.5, 95, 97.5, 100, 102.5, 105, 107.5, 110, 112.5, 115, 117.5, 120, 122.5, 125, 127.5, 130, 132.5, 135, 137.5, 140, 142.5, 145, 147.5, 150, 152.5, 155, 157.5, 160, 162.5, 165, 167.5, 170, 172.5, 175, 177.5, 180, 182.5, 185, 187.5, 190, 192.5, 195, 197.5, 200, 202.5, 205, 207.5, 210, 212.5, 215, 217.5, 220, 222.5, 225, 227.5, 230, 232.5, 235, 237.5, 240, 242.5, 245, 247.5, 250, 252.5, 255, 257.5, 260, 262.5, 265, 267.5, 270, 272.5, 275, 277.5, 280, 282.5, 285, 287.5, 290, 292.5, 295, 297.5, 300, 302.5, 305, 307.5, 310, 312.5, 315, 317.5, 320, 322.5, 325, 327.5, 330, 332.5, 335, 337.5, 340, 342.5, 345, 347.5, 350, 352.5, 355, 357.5 ; latitude = 90, 87.5, 85, 82.5, 80, 77.5, 75, 72.5, 70, 67.5, 65, 62.5, 60, 57.5, 55, 52.5, 50, 47.5, 45, 42.5, 40, 37.5, 35, 32.5, 30, 27.5, 25, 22.5, 20, 17.5, 15, 12.5, 10, 7.5, 5, 2.5, 0, -2.5, -5, -7.5, -10, -12.5, -15, -17.5, -20, -22.5, -25, -27.5, -30, -32.5, -35, -37.5, -40, -42.5, -45, -47.5, -50, -52.5, -55, -57.5, -60, -62.5, -65, -67.5, -70, -72.5, -75, -77.5, -80, -82.5, -85, -87.5, -90 ;

As shown in the first line of the listing, this file is called (it’s contains daily 500 millibar geopotential height data). It has dimensions called longitude, latitude and time. There are variables called longitude, latitude, time and z500. The variables with names that are the same as dimensions are called coordinate variables and are part of a metadata convention that provides information about the file dimensions. The NetCDF file format itself doesn’t require that dimensions have any more information provided for them than their name and size, but for most applications, it makes sense to give units and values for points along the dimensions.

If we look at the longitude variable, we see that it’s of type float and has one dimension, which is the longitude dimension – this is how you tell a coordinate variable from a data variable: it will have the same name as the dimension it goes with and will be indexed just by that dimension. Immediately after the line defining the longitude variable are the attributes for the variable. Here they give units and a display name (they can also give information about the range of values and the orientation of the coordinate axis). All of these attributes are again defined by a metadata convention, but they’re mostly pretty easy to figure out. Here, the longitude is given in degrees east of the prime meridian, and if we look further down the listing, we can see the data values for the longitude variable, running from zero degrees to 357.5°E. From all this, we can infer that the 144 longitude values in the file start at the prime meridian and increase eastwards.

Similarly, the latitude variable is a coodinate variable for the latitude dimension, and specifies the latitude of points on the globe. The latitude is measured in degrees north of the equator and ranges from 90° (the North pole) to -90° (the South pole). Taking a look at the data values for the latitude variable, we can see that 90 degrees north is in index 0, and the 73 latitude values decrease with increasing index until we reach the South pole.

The time coordinate variable is a little more interesting, mostly because of its units – this “hours since YYYY-MM-DD HH:MM:SS” approach to time units is very common in NetCDF files and it’s usually pretty easy to work with.

Finally, we get on to the data variable, z500. This is defined on a time/latitude/longitude grid (so, in the data, the longitude is the fastest changing coordinate). The variable has one slightly odd feature: its type. The types for the coordinate variables were all float or double, as you’d expect, but z500 is declared to be a short integer value. Why? Well, NetCDF files are quite often big so it can make sense to use some sort of encoding to reduce file sizes. (I worked on a paleoclimate modelling project where each model simulation resulted in about 200 Gb of data, for a dozen models for half a dozen different scenarios. In “Big Data” terms, it’s not so large, but it’s still quite a bit of data for people to download from a public server.) Here, the real-valued geopotential height is packed into a short integer. The true value of the field can be recovered from the short integer values in the file using the add_offset and scale_factor attributes – here the scale factor is unity, so we just need to add the add_offset to each value in the file to get the geopotential height.

Last of all we have the global attributes in the file. The most interesting of these is the Conventions attribute, which specifies that the file uses the CF metadata convention. This is the convention that defines how coordinate variables are represented, how data values can be compressed by scaling and offsetting, how units and axes are represented, and so on. Given a NetCDF file using the CF convention (or another related convention called the COARDS metadata convention), it’s pretty straightforward to figure out what’s going on.

Reading NetCDF files in Haskell

So, how do we read NetCDF files into a Haskell program to work on them? I’ve seen a few Haskell FFI bindings to parts of the main NetCDF C library, but none of those really seemed satisfactory for day-to-day use, so I’ve written a simple library called hnetcdf that includes both a low-level wrapping of the C library and a more idiomatic Haskell interface (which is what we’ll be using).

In particular, because NetCDF data is usually grid-based, hnetcdf supports reading data values into a number of different kinds of Haskell arrays (storable Vectors, Repa arrays and hmatrix arrays). For this analysis, we’re going to use hmatrix vectors and matrices, since they provide a nice “Matlab in Haskell” interface for doing the sort of linear algebra we’ll need.

In this section, we’ll look at some simple code for accessing the NetCDF file whose contents we looked at above which will serve as a basis for the more complicated things we’ll do later. (The geopotential height data we’re using here is from the ERA-Interim reanalysis project – again, I’ll explain what “reanalysis” means in a later article. For the moment, think of it as a “best guess” view of the state of the atmosphere at different moments in time.) We’ll open the NetCDF file, show how to access the file metadata and how to read data values from coordinate and data variables.

We need a few imports first, along with a couple of useful type synonyms for return values from hnetcdf functions:

import Prelude hiding (length, sum) import Control.Applicative ((<$>)) import qualified Data.Map as M import Foreign.C import Foreign.Storable import Numeric.Container import Data.NetCDF import Data.NetCDF.HMatrix type VRet a = IO (Either NcError (HVector a)) type MRet a = IO (Either NcError (HRowMajorMatrix a))

As well as a few utility imports and the Numeric.Container module from the hmatrix library, we import Data.NetCDF and Data.NetCDF.HMatrix – the first of these is the general hnetcdf API and the second is the module that allows us to use hnetcdf with hmatrix. Most of the functions in hnetcdf handle errors by returning an Either of NcError and a “useful” return type. The VRet and MRet type synonyms represent return values for vectors and matrices respectively. When using hnetcdf, it’s often necessary to supply type annotations to control the conversion from NetCDF values to Haskell values, and these type synonyms come in handy for doing this.

Reading NetCDF metadata

Examining NetCDF metadata is simple:

Right nc <- openFile "/big/data/reanalysis/ERA-Interim/" putStrLn $ "Name: " ++ ncName nc putStrLn $ "Dims: " ++ show (M.keys $ ncDims nc) putStr $ unlines $ map (\(n, s) -> " " ++ n ++ ": " ++ s) $ M.toList $ flip (ncDims nc) $ \d -> show (ncDimLength d) ++ if ncDimUnlimited d then " (UNLIM)" else "" putStrLn $ "Vars: " ++ show (M.keys $ ncVars nc) putStrLn $ "Global attributes: " ++ show (M.keys $ ncAttrs nc) let Just ntime = ncDimLength <$> ncDim nc "time" Just nlat = ncDimLength <$> ncDim nc "latitude" Just nlon = ncDimLength <$> ncDim nc "longitude"

We open a file using hnetcdf’s openFile function (here assuming that there are no errors), getting a value of type NcInfo (defined in Data.NetCDF.Metadata in hnetcdf). This is a value representing all of the metadata in the NetCDF file: dimension, variable and attribute definitions all bundled up together into a single value from which we can access different metadata elements. We can access maps from names to dimension, variable and global attribute definitions and can then extract individual dimensions and variables to find information about them. The code in the listing above produces this output for the ERA-Interim Z500 NetCDF file used here:

Name: /big/data/reanalysis/ERA-Interim/ Dims: ["latitude","longitude","time"] latitude: 73 longitude: 144 time: 7670 Vars: ["latitude","longitude","time","z500"] Global attributes: ["Conventions","history"] Accessing coordinate values

Reading values from a NetCDF file requires a little bit of care to ensure that NetCDF types are mapped correctly to Haskell types:

let (Just lonvar) = ncVar nc "longitude" Right (HVector lon) <- get nc lonvar :: VRet CFloat let mlon = mean lon putStrLn $ "longitude: " ++ show lon ++ " -> " ++ show mlon Right (HVector lon2) <- getS nc lonvar [0] [72] [2] :: VRet CFloat let mlon2 = mean lon2 putStrLn $ "longitude (every 2): " ++ show lon2 ++ " -> " ++ show mlon2

This shows how to read values from one-dimensional coordinate variables, both reading the whole variable, using hnetcdf’s get function, and reading a strided slice of the data using the getS function. In both cases, it’s necessary to specify the return type of get or getS explicitly – here this is done using the convenience type synonym VRet defined earlier. This code fragment produces this output:

longitude: fromList [0.0,2.5,5.0,7.5,10.0,12.5,15.0,17.5,20.0,22.5,25.0, 27.5,30.0,32.5,35.0,37.5,40.0,42.5,45.0,47.5,50.0,52.5,55.0,57.5,60.0, 62.5,65.0,67.5,70.0,72.5,75.0,77.5,80.0,82.5,85.0,87.5,90.0,92.5,95.0, 97.5,100.0,102.5,105.0,107.5,110.0,112.5,115.0,117.5,120.0,122.5,125.0, 127.5,130.0,132.5,135.0,137.5,140.0,142.5,145.0,147.5,150.0,152.5,155.0, 157.5,160.0,162.5,165.0,167.5,170.0,172.5,175.0,177.5,180.0,182.5,185.0, 187.5,190.0,192.5,195.0,197.5,200.0,202.5,205.0,207.5,210.0,212.5,215.0, 217.5,220.0,222.5,225.0,227.5,230.0,232.5,235.0,237.5,240.0,242.5,245.0, 247.5,250.0,252.5,255.0,257.5,260.0,262.5,265.0,267.5,270.0,272.5,275.0, 277.5,280.0,282.5,285.0,287.5,290.0,292.5,295.0,297.5,300.0,302.5,305.0, 307.5,310.0,312.5,315.0,317.5,320.0,322.5,325.0,327.5,330.0,332.5,335.0, 337.5,340.0,342.5,345.0,347.5,350.0,352.5,355.0,357.5] -> 178.75 longitude (every 2): fromList [0.0,5.0,10.0,15.0,20.0,25.0,30.0,35.0,40.0, 45.0,50.0,55.0,60.0,65.0,70.0,75.0,80.0,85.0,90.0,95.0,100.0,105.0,110.0, 115.0,120.0,125.0,130.0,135.0,140.0,145.0,150.0,155.0,160.0,165.0,170.0, 175.0,180.0,185.0,190.0,195.0,200.0,205.0,210.0,215.0,220.0,225.0,230.0, 235.0,240.0,245.0,250.0,255.0,260.0,265.0,270.0,275.0,280.0,285.0,290.0, 295.0,300.0,305.0,310.0,315.0,320.0,325.0,330.0,335.0,340.0,345.0,350.0, 355.0] -> 177.5

The mean function used in above is defined as:

mean :: (Storable a, Fractional a) => Vector a -> a mean xs = (foldVector (+) 0 xs) / fromIntegral (dim xs)

It requires a Storable type class constraint, and makes use of hmatrix’s foldVector function.

Accessing data values

Finally, we get round to reading the data that we’re interested in (of course, reading the metadata is a necessary prerequisite for this: this kind of geospatial data doesn’t mean much unless you can locate it in space and time, for which you need coordinate variables and their associated metadata).

The next listing shows how we read the Z500 data into a row-major hmatrix matrix:

let (Just zvar) = ncVar nc "z500" putStrLn $ "z500 dims: " ++ show (map ncDimName $ ncVarDims zvar) Right slice1tmp <- getA nc zvar [0, 0, 0] [1, nlat, nlon] :: MRet CShort let (HRowMajorMatrix slice1tmp2) = coardsScale zvar slice1tmp :: HRowMajorMatrix CDouble slice1 = cmap ((/ 9.8) . realToFrac) slice1tmp2 :: Matrix Double putStrLn $ "size slice1 = " ++ show (rows slice1) ++ " x " ++ show (cols slice1) putStrLn $ "lon(i=25) = " ++ show (lon @> (25 - 1)) putStrLn $ "lat(j=40) = " ++ show (lat @> (nlat - 40)) let v @!! (i, j) = v @@> (nlat - i, j - 1) putStrLn $ "slice1(i=25,j=40) = " ++ show (slice1 @!! (25, 40))

There are a number of things to note here. First, we use the getA function, which allows us to specify starting indexes and counts for each dimension in the variable we’re reading. Here we read all latitude and longitude points for a single vertical level in the atmosphere (which is the only one there is in this file). Second, the values stored in this file are geopotential values, not geopotential height (so their units are m s-2 instead of metres, which we can convert to geopotential height by dividing by the acceleration due to gravity (about 9.8 m s-2). Third, the geopotential values are stored in a compressed form as short integers according to the COARDS metadata convention. This means that if we want to work with floating point values (which we almost always do), we need to convert using the hnetcdf coardsScale function, which reads the relevant scaling and offset attributes from the NetCDF variable and uses them to convert from the stored data values to some fractional numeric type (in this case CDouble – the destination type also needs to be an instance of hnetcdf’s NcStorable class).

Once we have the input data converted to a normal hmatrix Matrix value, we can manipulate it like any other data value. In particular, here we extract the geopotential height at given latitude and longitude coordinates (the @!! operator defined here is just a custom indexing operator to deal with the fact that the latitude values are stored in north-to-south order).

The most laborious part of all this is managing the correspondence between coordinate values and indexes, and managing the conversions between the C types used to represent values stored in NetCDF files (CDouble, CShort, etc.) and the native Haskell types that we’d like to use for our data manipulation activities. To be fair, the first of these problems is a problem for any user of NetCDF files, and Haskell’s data abstraction capabilities at least make dealing with metadata values less onerous than in C or C++. The second issue is a little more annoying, but it does ensure that we maintain a good cordon sanitaire between external representations of data values and the internal representations that we use.

What’s next

We’re going to have to spend a couple of articles covering some background to the atmospheric variability problem we’re going to look at, just to place some of this stuff in context: we need to look a little at just what this study is trying to address, we need to understand some basic facts about atmospheric dynamics and the data we’re going to be using, and we need to take a look at the gross dynamics of the atmosphere as they appear in these data, just so that we have some sort of idea what we’re looking at later on. That will probably take two or three articles, but then we can start with some real data analysis.

haskell data-analysis <script src="" type="text/javascript"></script> <script type="text/javascript"> (function () { var articleId = fyre.conv.load.makeArticleId(null); fyre.conv.load({}, [{ el: 'livefyre-comments', network: "", siteId: "290329", articleId: articleId, signed: false, collectionMeta: { articleId: articleId, url: fyre.conv.load.makeCollectionUrl(), } }], function() {}); }()); </script>
Categories: Offsite Blogs

Code review requested: a Python 3 interpreter

Haskell on Reddit - Wed, 07/16/2014 - 1:28pm

I'm implementing a Python 3 interpreter as a way to learn Haskell better. I'm not very far along, so I'd like to ask for a quick code review of my Github repo:

These are the parts of it I don't like:

  • Constant readIORef/writeIORef. It seems like this boilerplate spreads everywhere. Maybe it's better to pass the environment around somehow?
  • Clunky pattern matching on binary operators and all valid types. Tried to generalize this a bit but ran into difficulties with traditional operators returning Ints vs comparison operators returning Bools.
  • Storing the current return value in the environment

Thanks for any advice you have.

submitted by mattygrocks
[link] [12 comments]
Categories: Incoming News

[learn] Codata, co-totality, co-complexity

Haskell on Reddit - Wed, 07/16/2014 - 9:40am

I've been trying to figure out why it's difficult to reason about time and space complexity in Haskell. After all, Haskell seems to be the "co-universe" twin of a strict language like ML, so there's got to be some kind of "co-complexity" that allows compositional reasoning... right?

A summary of my current thinking is in this blog post: Codata, co-totality, co-complexity. Basically, it's not codata that makes it hard to reason about complexity, it's pervasive memoization. The proper "co-universe" twin of ML would not be Haskell, but a "pure codata" language based on call-by-name without memoization. In such a language, there would indeed be a usable definition of "co-complexity", and it would behave compositionally as you'd expect. (Though most people wouldn't want to program in such a language...)

Am I on the right track? Any pointers for further reading?

submitted by want_to_want
[link] [32 comments]
Categories: Incoming News

Mark Jason Dominus: Guess what this does

Planet Haskell - Wed, 07/16/2014 - 8:37am

Here's a Perl quiz that I confidently predict nobody will get right. Without trying it first, what does the following program print?

perl -le 'print(two + two == five ? "true" : "false")'

(I will discuss the surprising answer tomorrow.)

Categories: Offsite Blogs

Memory consumption issues under heavy networkthroughput/concurrency loads

haskell-cafe - Tue, 07/15/2014 - 6:18pm
I have been testing solutions in several languages for running a network daemon that accepts hundreds of thousands of websocket connections, and moves messages between them. Originally I used this websocket lib,, but upon discovering a rather severe memory leak issue even when sending just a basic ping, switched to Michael Snoyman's first stab at a websocket impl for Yesod here: When under mild load (5k connections, each pinging once every 10+ seconds), memory usage remained stable and acceptable around 205 MB. Switching to higher load (pinging every second or less), memory usage spiked to 560 MB, and continued to 'leak' slowly. When I quit the server with the profile diagnostics, it indicate that hundreds of MB were "lost due to fragmentation". In addition, merely opening the connections and dropping them, repeatedly, made the base memory usage go up.
Categories: Offsite Discussion

Is there any way to compile a program in an embedded language?

Haskell on Reddit - Tue, 07/15/2014 - 11:38am

Haskell is very good for designing embedded DSLs - everyone knows that. Once that language is complete, though, you now will have to design a compiler for it, if you want any serious computation. I'm aware of projects like PyPy, which gives you a JIT compiler for free, given a correct interpreter. Is there something in those lines for Haskell?

submitted by SrPeixinho
[link] [6 comments]
Categories: Incoming News

What is the state of the art in testing codegeneration?

haskell-cafe - Tue, 07/15/2014 - 11:26am
Tom Ellis wrote: First I should point out Magic Haskeller (which generates typed (Haskell) terms and then accepts those that satisfy the given input-output relationship). It is MagicHaskeller on Hackage. The most straightforward method is to generate random terms and filter well-typed ones. This is usually a bad method since many or most randomly generated terms will be ill-typed. One should generate well-typed terms from the beginning, without any rejections. That is actually not difficult: one merely needs to take the type checker (which one has to write anyway) and run it backwards. Perhaps it is not as simple as I made it sound, depending on the type system (for example, the type inference for simply-typed lambda-calculus without any annotations requires guessing. One has to guess correctly all the type, otherwise the process becomes slow). It has been done, in a mainstream functional language: OCaml. The code can be re-written for
Categories: Offsite Discussion

Designing a Lisp based on the ideas on "Total Funcional Programming".

Haskell on Reddit - Tue, 07/15/2014 - 11:25am

I'm thinking in how one could apply the ideas on this paper to design an unityped language similar to Lisp. There are no algebraic datatypes or pattern matching - just lists, numbers and predicates. The type system would be much simpler. I'm thinking in STLC with only one type, Term, which Lists and Numbers embedded, plus a few mathematical primitives and a fold\unfold operator on the List.

data Term = Cons Term Term | Nil | Num Float | Lam Term | Var Int | Fold ... etc fold :: (Term → Term → Term) → Term → Term fold f i (Cons x xs) = f x (fold f i xs) fold f i Nil = i unfold :: (Term → Term) → Term → Term unfold f x = go (f x) where go (Cons x xs) = Cons x (unfold f xs) go x = x

If I'm not mistaken, the STLC part makes up for a total language, and the fold/unfold operators enable most practical algorithms.


  1. Is that indeed a total language?

  2. Is there any important practical algorithm that would be missing from that language?

  3. Is there a decision algorithm that will find the complexity of an operation before evaluating it?

submitted by SrPeixinho
[link] [20 comments]
Categories: Incoming News

Hackage view source

Haskell on Reddit - Tue, 07/15/2014 - 9:20am

Where did the option the view the source and more detailed documentation go on Hackage? It's no longer on the right and I can't find it.

Edit: I can see it on this page:

But not here.

submitted by Tyr42
[link] [8 comments]
Categories: Incoming News