News aggregator

Roman Cheplyaka: A case for static linking in scientific computing

Planet Haskell - Fri, 09/09/2016 - 2:00pm

When researchers run scientific software on high-performance clusters, they often experience problems with shared libraries, such as this one:

bcftools: /lib64/libz.so.1: version `ZLIB_1.2.5.2' not found

Or this one:

eagle: error while loading shared libraries: libhts.so.1: cannot open shared object file: No such file or directory

Popular advice points them in the direction of LD_LIBRARY_PATH, but a simple and robust solution—static linking—is often overlooked.

In this article, I explain the background behind static and dynamic linking, demonstrate the advantages of static linking, address some of the objections against static linking, and give instructions on how to prepare static binaries.

What is static and dynamic linking?

The word linking itself refers to the process of assembling a complete program from libraries of subprograms1.

Static linking occurs as the last stage of the compilation process; the required libraries are embedded in the final binary file of your program.

Some (or even all) of the libraries may not be included in the binary at this stage. In this case, when we attempt to run the program, we need dynamic linking in order to find the libraries and make the missing subroutines accessible to the program. This second type of libraries is called dynamic, or shared, libraries. The files with these libraries are usually named libsomething.so on Linux and something.dll on Windows.

The rules that an operating system follows when it searches for dynamic libraries are complex. And simply having a library in the right place is not enough; it needs to be the same version of the library that was used during the compilation, or at least a different version with the same ABI. (So no, you shouldn’t ln -s /usr/lib/libsomething.so.2 /usr/lib/libsomething.so.1 when a program doesn’t work, contrary to another popular piece of advice one can find on the Internet.)

Linux distributions, most of which dynamically link the software they distribute, manage this by engaging qualified package maintainers and by having tools and centralized infrastructure to build and distribute packages.

But the world of scientific software is not there yet. And if you fail to take care of your dynamic libraries, the result can vary from a program refusing to start (as we saw earlier), to a program crash in the middle of operation, to a hard to detect and diagnose case of data corruption.

This is why I think that scientific software should be linked statically by default.

Advantages of static linking

Reproducibility. When a program is linked statically, it executes the same algorithm wherever it is run. A dynamic executable executes the code from the version of the dynamic library that happens to be installed on a particular computing node.

Note that the static executable doesn’t contain the metainformation about the library versions used to build it. You should record that information when you compile the software, and ideally use a binary repository manager for your builds.

But replacing dynamic linking with static linking by itself dramatically increases the probability that your program will run tomorrow in the same way as it runs today.

Ease of distribution. Suppose that you want your colleague to run the program you have compiled. If you link statically, you only need to distribute a single binary. If you link dynamically, you need to distribute the binary, all dynamic libraries that it depends on, and the instructions on how to make the binary find its libraries.

Portability. Static linking ensures that you can compile the program on your Ubuntu laptop and run it on a Debian or CentOS cluster. With dynamic linking, you’d have to compile it directly on a cluster or in an identical environment.

That said, you still need to ensure that both systems use the same or compatible architectures (the majority of scientific computing happens on x86-64 anyway), the same OS (probably Linux) and not too different kernel versions (or libc versions if you link libc dynamically).

No problems with finding libraries. Since no dynamic libraries are needed, the OS cannot fail to find them. No more cannot open shared object file messages. No more LD_LIBRARY_PATH tricks.

Isn’t static linking considered harmful?

Ulrich Drepper says it is:

There are still too many people out there who think (or even insist) that static linking has benefits. This has never been the case and never will be the case.

Ulrich certainly knows about this stuff much more than I ever hope to. But as you can tell from the above quote, he is sometimes a bit extreme in his judgment.

There is no shortage of knowledgeable people who disagree with him on this issue.

But more importantly, he looks at linking from a very different perspective. For many years, he was employed by Red Hat. He was one of those people who knew a lot about dealing with dynamic libraries and maintained a centralized repository of packages that worked well together in a controlled environment.

It is understandable that he would not care about any of the advantages I list above (though this is different from claiming that there has never been and never will be any benefits to static linking).

But what about the advantages of the dynamic linking that Ulrich describes in his article?

Centralized bug/security fixes.

  1. Security issues matter less for scientific software because it is not exposed to the outside world.

  2. HPC cluster users don’t benefit from centralized bug fixes because usually they don’t have the permissions to install software system-wide. Every user of the same cluster or node would still be responsible for their own updates.

  3. The scale is very different. If you are Red Hat, re-linking hundreds or thousands of binaries every time there is an update in a library is a significant burden. If you are a researcher, you deal maybe with a dozen or two programs, and you may not have to update them often.

  4. Even when centralized updates are possible (e.g. if you can request libraries to be installed centrally and then link against them), scientists would not want them because they are directly at odds with reproducibility.

More efficient use of physical memory through sharing the code.

  1. In high-performance computing, the size of the libraries is usually negligible compared to the size of the data being processed.

  2. When the number of running processes is small, and they don’t have many common dependencies, there’s not much opportunity for sharing.

  3. On the other hand, sometimes multiple copies of the same executable are run in parallel. This happens with software that is not capable of multithreading or cannot exploit it efficiently. Well, in this case, the OS actually can share the code across the processed because it is exactly the same.

  4. When there’s little sharing of code between processes, static linking can sometimes be more memory-efficient. This is because static linking only embeds the object files (i.e. parts of a library) that are actually used by the application, whereas dynamic linking has to load the entire library into memory.

Security measures like load address randomization—see above.

Some features of glibc require dynamic linking. Ulrich, by the way, was one of the core developers of glibc—just in case you were wondering why he considers this a problem of static linking and not a problem of glibc.

Fortunately, most scientific software doesn’t perform character conversions or go to the network. It just crunches numbers. You don’t need dynamic linking for that.

Licensing considerations. I am not a lawyer, but as far as I can tell, this should concern you only if the software is closed-source (or distributed under a license incompatible with GPL) and some of those dependencies are licensed under LGPL. In that case, those dependencies must be linked dynamically, although the other ones can still be linked statically.

I am not sure why Ulrich writes “(L)GPL”, since, to my knowledge, GPL itself does not make a distinction between static and dynamic linking, but I am happy to be corrected.

Tools and hacks like ltrace, LD_PRELOAD, LD_PROFILE, LD_AUDIT don’t work. Oh well.

OK, how do I do this?

Unfortunately, most of the scientific software I come across is linked dynamically by default. Otherwise, I wouldn’t be writing this article.

Convincing the build system

Read the installation instructions. They usually can be found in a file named README, INSTALL, or on the website. If they mention static linking, congratulations.

If not, also try looking inside the Makefile (or whatever build system the software uses). If there is a configure script, try ./configure --help. There could be a target for static linking that the author has not documented.

If the build system doesn’t support static linking out of the box, you will have to modify the linker flags. If a Makefile is well-written, this should be as simple as LDFLAGS=-static make or make LDFLAGS=-static (these are different; try them in this order).

If that doesn’t work, edit the Makefile. Locate the rule that does linking, i.e. produces the final executable.

It usually looks like this:

program: $(OBJS) gcc -o program $(OBJS)

Note that gcc could also be g++, or ld, or hidden behind a variable such as $(CC), $(CXX), or $(LD). The variable $(OBJS) could also be named differently, or be a literal list of .o, .c, or .cpp files. But the program is usually exactly the name of the final program (or, again, a variable that expands to one).

Once you located this rule, try adding a -static flag to the gcc command line:

program: $(OBJS) gcc -static -o program $(OBJS)

In many cases it will be enough to perform static linking.

Sometimes you need to get more creative. One tool, for instance, explicitly built itself as a shared library. This is incompatible with a (global) -static flag set as part of $(LDFLAGS). The way I solved this was to specify a target explicitly, i.e. make prog1 prog2, so that it wouldn’t attempt to build a dynamic library and fail.

Dependencies

In order to statically link a program, you need to have its dependencies available as static libraries. These are files names as libsomething.a.

If this is a library available in your distribution:

  • For Debian and derived distros (e.g. Ubuntu), static libraries usually reside in the libsomething-dev package. If you’ve done dynamic linking of the program, you probably have this package installed already because it also contains the header files.

  • For Red Hat and derived distributions (Fedora, CentOS; not sure about other rpm-based distros), static libraries are often placed in separate packages named libsomething-static.

If this is a third-party library, you’ll need to get a static version of it or compile it from source, following the same instructions that you are reading right now.

Verifying the result

How do you check that you got a static binary? Try running file and ldd on it.

For a dynamic binary, you’ll get something like this:

% ldd ./eagle linux-vdso.so.1 (0x00007ffd47d87000) libhts.so.1 => not found libboost_program_options.so.1.49.0 => not found libboost_iostreams.so.1.49.0 => not found libz.so.1 => /lib64/libz.so.1 (0x00007fe77a445000) libopenblas.so.0 => /lib64/libopenblas.so.0 (0x00007fe778133000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe777f17000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe777b90000) libm.so.6 => /lib64/libm.so.6 (0x00007fe777886000) libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fe777658000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe777441000) libc.so.6 => /lib64/libc.so.6 (0x00007fe77707e000) libgfortran.so.3 => /lib64/libgfortran.so.3 (0x00007fe776d4d000) /lib64/ld-linux-x86-64.so.2 (0x000055f7b3885000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fe776b49000) libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007fe776908000) % file ./eagle ./eagle: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.26, BuildID[sha1]=af18461c835d6f0209754b78c639581c67ed1443, stripped

For a static binary, you’ll see this instead:

% ldd ./Minimac3 not a dynamic executable % file ./Minimac3 ./Minimac3: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.26, BuildID[sha1]=edf443bb3e695b3f421b0d162ca30bb0f422c2ad, stripped

By the way, if file says that your binary is “not stripped”, run the strip command on it:

% strip eagle

This will significantly reduce its on-disk size and make it faster to copy to the cluster.

<section class="footnotes">
  1. To avoid confusion, I should note that I am talking here about software written in compiled languages such as C, C++, or Fortran. I am not talking about interpreted languages (Perl, Python, Ruby) or bytecode-compiled languages (Java, Scala).

</section>
Categories: Offsite Blogs

LHC Team: Haskell Suite: Scoping.

Planet Haskell - Fri, 09/09/2016 - 5:14am
This post answers why I created 'haskell-scope' even though there's already another library that addresses the same problem.

There are two libraries for resolving references in Haskell source code on Hackage: haskell-names and haskell-scope. Of the two, haskell-names is the oldest, the most feature complete, and the most ambitious. It uses a very innovative scheme that allows the scope to be inspected at any point in the syntax tree. You can read more about it in the linked article. Unfortunately, all this innovation comes at a price of complexity.

Here's the complete list of extensions used by haskell-names: CPP, ConstraintKinds, DefaultSignatures, DeriveDataTypeable, DeriveFoldable, DeriveFunctor, DeriveTraversable, FlexibleContexts, FlexibleInstances, FunctionalDependencies, GADTs, GeneralizedNewtypeDeriving, ImplicitParams, KindSignatures, MultiParamTypeClasses, NamedFieldPuns, OverlappingInstances, OverloadedStrings, RankNTypes, ScopedTypeVariables, StandaloneDeriving, TemplateHaskell, TupleSections, TypeFamilies, TypeOperators, UndecidableInstances, and ViewPatterns.

A total of 27 extensions and many of them will never be implemented by LHC. If LHC is to compile itself one day, this obviously won't do. Enter haskell-scope: a library more plain than bread without butter. Give it an AST and it will annotate all the references. Nothing more, nothing less.
Categories: Offsite Blogs

Vincent Hanquez: Foundation

Planet Haskell - Thu, 09/08/2016 - 6:00pm

A new hope. Foundation is a new library that tries to define a new modern Haskell framework. It is also trying to be more than a library: A common place for the community to improve things and define new things

It started as a thought experiment:

What would a modern Haskell base looks like if I could start from scratch ?

What would I need to complete my Haskell projects without falling into traditional pitfalls like inefficient String, all-in-one Num, un-productive packing and unpacking, etc.

One of the constraints, that was set early on, was not depending on any external packages, instead depending only on what GHC provides (itself, base, and libraries like ghc-prim). While it may sound surprising, especially considering the usually high quality and precision of libraries available on hackage, there are many reasons for not depending on anything; I’ll motivate the reason later in the article, so hang on.

A very interesting article from Stephen Diehl on production, that details well some of the pitfalls of Haskell, or the workaround for less than ideal situation/choice, outline pretty well some of the reasons for this effort.

Starting with the basic

One of the few basic things that you’ll find in any modern haskell project, is ByteArray, Array and packed strings. Usually in the form of the bytestring, vector and text packages.

We decided to start here. One of the common problem of those types is their lack of inter-operability. There’s usually a way to convert one into another, but it’s either exposed in an Unsafe or Internal module, or has a scary name like unsafeFromForeignPtr.

Then, if you’re unlucky you will see some issues with unpinned and pinned (probably in production settings to maximise fun); the common ByteString using the pinned memory, ShortByteString and Text using unpinned memory, and Vector, well, it’s complicated (there’s 4 different kind of Vectors).

Note: pinned and unpinned represent whether the memory is allowed to move by the GC. Unpinned usually is better as it allows the memory system to reduce fragmentation, but pinned memory is crucial for dealing with Input/Output with the real world, large data (and some other uses).

Unboxed Array

Our corner stone is the unboxed array. The unboxed array is a native Haskell ByteArray (represented by the ByteArray# primitive type), and it is allowed to be unpinned or pinned (at allocation time). To also support further interesting stuff, we supplement it with another constructor to make it able to support natively a chunk of memory referenced by a pointer.

In simplified terms it looks like:

data UArray ty = UArrayBA (Offset ty) (Size ty) PinnedStatus ByteArray# | UArrayAddr (Offset ty) (Size ty) (Ptr ty)

With this capability, we have the equivalent of ByteString, ShortByteString, Unboxed Vector and (Some) Storable Vector, implemented in one user friendly type. This is a really big win for users, as suddenly all those types play better together; they are all the same thing working the same way.

Instead of differentiating ByteString and Vector, now ByteString disappears completely in favor of just being a UArray Word8. This has been tried before with the current ecosystem with vector-bytestring.

String

String is a big pain point. Base represents it as a list of Char [Char], which as you can imagine is not efficient for most purpose. Text from the popular text package implements a packed version of this, using UTF-16 and unpinned native ByteArray#.

While text is almost a standard in haskell, it’s very likely you’ll need to pack and unpack this representation to interact with base functions, or switch representation often to interact with some libraries.

Note on Unicode: UTF-8 is an encoding format where unicode codepoints are encoded in sequence of 1 to 4 bytes (4 different cases). UTF-16 represent unicode sequences with either 2 or 4 bytes (2 different cases).

Foundation’s String are packed UTF-8 data backed by an unboxed vector of bytes. This means we can offer a lightweight type based on UArray Word8:

newtype String = String (UArray Word8)

So by doing this, we inherit directly all the advantages of our vector types; namely we have a String type that is unpinned or pinned (depending on needs), and supports native pointers. It is extremely lightweight to convert between the two: provided UTF8 binary data, we only validate the data, without re-allocating anything.

There’s no perfect representation of unicode; each representation has it own advantages and disadvantages, and it really depends on what types of data you’re actually processing. One of the easy rules of thumb is that the more your representation has cases, the slower it will be to process the highest unicode sequences.

By extension, it means that choosing a unique representation leads to compromise. In early benchmarks against text we are consistently outperforming Text when the data is predominantly ASCII (i.e. 1-byte encoding). In other type of data, it really depends; sometimes we’re faster still, sometimes slower, and sometimes par.

Caveat emptor: benchmarks are far from reliable, and only been run on 2 machines with similar characteristic so far.

Other Types

We also support already:

  • Boxed Array. This is an array to any other Haskell types. Think of it as array of pointers to another Haskell value
  • Bitmap. 1 bit packed unboxed array

In the short term, we expect to add:

  • tree like structure.
  • hash based structure.
Unified Collection API

Many types of collections support the same kind of operations.

For example, commonly you have very similar functions defined with different types:

take :: Int -> UArray a -> UArray a take :: Int -> [a] -> [a] take :: Int -> String -> String head :: UArray a -> a head :: String -> Char head :: [a] -> a

So we tried to avoid monomorphic versions of common Haskell functions and instead provide type family infused versions of those functions. In foundation we have:

take :: Int -> collection -> collection head :: collection -> Element collection

Note: Element collection is a type family. this allow from a type “collection” to define another type. for example, the Element of a [a] is a, and the Element of String is Char.

Note2: head is not exactly defined this way in foundation: This was the simplest example that show Type families in action and the overloading. foundation’s head is not partial and defined: head :: NonEmpty collection -> Element collection

The consequence is that the same head or take (or other generic functions) works the same way for many different collection types, even when they are monomorphic (e.g. String).

For another good example of this approach being taken, have a look at the mono-traversable package

For other operations that are specific to a data structure, and hard to generalize, we still expose dedicated operations.

The question of dependencies

If you’re not convinced by how we provide a better foundation to the standard Haskell types, then it raises the question: why not depend on those high quality libraries doing the exact same thing ?

Consistency. I think it’s easier to set a common direction, and have a consistent approach when working in a central place together, than having N maintainers working on M packages independently.

Common place. An example speaks better than words sometime: I have this X thing, that depends on the A package, and the B package. Should I add it to A, to B, or create a new C package ?

Atomic development. We don’t have to jump through hoops to improve our types or functions against other part of foundation. Having more things defined in a same place, means we can be more aggressive about improving things faster, while retaining an overall package that make sense.

Versions, and Releases. Far easier to depends on a small set of library than depends on hundreds of different versions. Particularly in an industrial settings, I will be much more confident tracking 1 package, watch 1 issue tracker and deal with a set of known people, than having to deal with N packages, N issues trackers (possibly in different places), and N maintainers.

Some final notes

A fast iterative release schedule. Planning to release early, release often. and with a predictable release schedule.

We’re still in the early stage. While we’re at an exciting place, don’t expect a finish product right now.

You don’t need to be an expert to help. anyone can help us shape foundation.

Join us. If you want to get involved: all Foundation works take place in the open, on the haskell-foundation organisation with code, proposals, issues and voting, questions.

Categories: Offsite Blogs

Theory Lunch (Institute of Cybernetics, Tallinn): Nonuniversality in computation: A proof by semantic shift?

Planet Haskell - Thu, 09/08/2016 - 8:59am

Today, the 8th of September 2016, we had a very interesting discussion about a theorem, due to Selim G. Akl, pointed to me in a tweet by Andy Adamatzky. Such theorem has, according to Akl, the consequence that the Church-Turing thesis, a basic tenet of theoretical computer science, is false. Of course, surprising statements require solid arguments: is Akl’s solid enough?

First of all, let us recall what the Church-Turing thesis is, and what it is not. Its statement, as reported by the Stanford Encyclopedia of Philosophy, goes as follows:

A function of positive integers is effectively calculable only if recursive.

Here, for a calculation procedure to be “effective” means the following:

  1. it has a finite description;
  2. it always returns the correct output, given any valid input;
  3. it can be “carried on by pencil and paper” by a human being; and
  4. it requires no insight or ingenuity on the human’s behalf.

One model of effective procedures is given by the recursive functions; another one, by the functions computable by Turing machines; a third one, by the functions which are representable in Church’s -calculus. Alan Turing and Stephen Cole Kleene proved that the three classes coincide: thus, in ordinary practice, the Church-Turing thesis is often stated with “Turing-computable” in place of “recursive”.

The class of Turing machines has the advantage of containing a universal element: a special Turing machine and an encoding from the set of Turing machines to the set of natural numbers exists such that, when the special Turing machine is provided the encoding of an arbitrary Turing machine and a valid input for the latter, it will return the value of the encoded Turing machine on the provided input.

Now that we have written down what the Church-Turing thesis is, we can examine Akl’s theorem.

In his 2005 paper, Akl defines a universal computer as a system having the following features:

  1. It has means of communicating with the outside world, so to receive input, and where to send its output.
  2. It can perform every elementary arithmetic and logic operations.
  3. It can be programmed, according to the two previous rules.
  4. It has unlimited memory to use for input, output, and temporary values.
  5. It can only execute finitely many operations (evaluating input, producing output, performing an elementary operation, etc.) at each time step.
  6. It can simulate any computation performed by any other model of computation.

The statement of the theorem, which does not appear explicitly in the original paper but is written down in the one from 2015 which clarifies the idea and addresses criticism, is hereby reported verbatim:

Nonuniversality in Computation Theorem (NCT): No computer is universal if it is capable of exactly operations during time unit of computation, where is a positive integer, and is finite and fixed once and for all.

The main argument is that no such computer can perform a computation which requires more than operations at some time . Explicit examples happen in parallel computation, a field Akl is a master of, where the number of operations that can be performed in a time unit grows linearly with the number of processors: for instance, reading values in input can be done in time by a parallel machine with processors, but not by any machine with processors.

Such requirement, however, does not appear in the notion of universality at the base of the original, and actual, Church-Turing thesis. There, to “simulate” a machine or algorithm means to be able of always reproducing the same output of the algorithm, given any valid input for it, up to an encoding of the input and the output. But no hypothesis on how the output is achieved from the input is made: a simulation in linear time, such that each step of the simulated algorithm is reproduced by exactly operations of the Turing machine, is as good as one where the simulation of the th step takes operations from the Turing machine, or where no such regularity appears.

Among the (counter)examples provided by Akl are:

  1. Computations with time-varying variables.
  2. Computations with time-varying computational complexity.
  3. Computations whose complexity depends on their placement on a schedule.
  4. Computations with interacting variables, e.g., states of entangled electrons.
  5. Computations with uncertain time constraints.

None of these, however, respect the definition of computation from the model of recursive functions: where the values of the variables are given once and for all, and can possibly change for recursive calls, but not for the original call. They can be seen as instances of unconventional models of computation: but by doing this, one changes the very notion of computation, which ceases to be the one at the basis of the Church-Turing thesis.

So my guess is that Akl’s statement about the falsity of the Church-Turing thesis actually falls in the following category, as reported in the humorous list by Dana Angluin:

Proof by semantic shift: Some standard but inconvenient definitions are changed for the statement of the result.

Actually, if we go back to Akl’s definition of a universal computer, it appears to be fine until the very last: the first two points agree with the definition of effective computation at the basis of the actual Church-Turing thesis, the next three are features of any universal Turing machine. The problem comes from the last point, which has at least two weak spots: the first one being that it does not define precisely what a model of computation is, which can be accepted as Akl is talking of unconventional computation, and it is wiser to be open to other possibilities. But there is a more serious one, in that it is not clear

what does the expression “to simulate” mean.

Note that the Stanford Encyclopedia of Philosophy reports the following variant of the Church-Turing thesis, attributed to David Deutsch:

Every finitely realizable physical system can be perfectly simulated by a universal model computing machine operating by finite means.

Deutsch’s thesis, however, does not coincide with the Church-Turing thesis! (This, notwithstanding Deutsch’s statement that “[t]his formulation is both better defined and more physical than Turing’s own way of expressing it”.) Plus, there is another serious ambiguity, which is of the same kind as the one in Akl’s definition:

what is “perfectly simulated” supposed to mean?

Does it mean that every single step performed by the system can be reproduced in real time? In this case, Akl is perfectly right in disproving it under the constraint of boundedly many operations at each time unit. Or does it mean that the simulation of each elementary step of the process (e.g., one performed in a quantum of time) ends with the correct result if the correct initial conditions are given? In this case, the requirement to reproduce exactly what happens between the reading of the input and the writing of the output is null and void.

Worse still, there is a vulgarized form of the Church-Turing thesis, which is reported by Akl himself on page 172 of his 2005 paper!, and goes as follows:

Any computable function can be computed on a Turing machine.

If one calls that “the Church-Turing thesis”, then Akl’s NCT is absolutely correct in disproving it. But that is not the actual Church-Turing thesis! It is actually a rewording of what in the Stanford Encyclopedia of Philosophy is called “Thesis M”, and explicitly stated not to be equivalent to the original Church-Turing thesis—and also false. Again, the careful reader will have noticed that, in the statement above, being “computable by a Turing machine” is a well defined property, but “computable” tout court definitely not so.

At the end of this discussion, my thesis is that Akl’s proof is correct, but NCT’s consequences and interpretation might not be what Akl means, or (inclusive disjunction) what his critics understand. As for my personal interpretation of NCT, here it goes:

No computer which is able to perform a predefinite, finite number of operations at each finite time step, is universal across all the different models of computation, where the word “computation” may be taken in a different meaning than that of the Church-Turing thesis.

Is mine an interpretation by semantic shift? Discussion is welcome.

References:

  1. Selim G. Akl. The Myth of Universal Computation. Parallel Numerics ’05, 167–192.
  2. Selim G. Akl. Nonuniversality explained. International Journal of Parallel, Emergent and Distributed Systems 31:3, 201–219. doi:10.1080/17445760.2015.1079321
  3. The Church-Turing Thesis. Stanford Encyclopedia of Philosophy. First published January 8, 1997; substantive revision August 19, 2002. http://plato.stanford.edu/entries/church-turing/
  4. Dana Angluin’s List of Proof Techniques. http://www.cs.northwestern.edu/~riesbeck/proofs.html

Categories: Offsite Blogs

NOOL 2016

Lambda the Ultimate - Wed, 09/07/2016 - 2:54am

At this year's SPLASH there will be a follow-up to last year's NOOL 2015 workshop (discussed here on LtU). Objects may not be the trendiest thing in PL circles, but I think they have a bright future. (Example: session types are essentially concurrent objects in disguise, and they are quite trendy.)

Please consider submitting! (Disclaimer: I'm one of the organisers :)

Categories: Offsite Discussion

Edward Z. Yang: The Base of a String Theory for Haskell

Planet Haskell - Wed, 09/07/2016 - 12:22am

One of the early posts from this blog, from 2010, was on the subject of how to pick your string library in Haskell. Half a decade later, the Haskell ecosystem is still largely in the same situation as it was half a decade ago, where most of the boot libraries shipped with GHC (e.g., base) still use the String type, despite the existence of superior string types. The problem is twofold:

  1. No one wants to break all of the existing code, which means libraries like base have to keep String versions of all their code. You can't just search-replace every occurrence of String with Text.
  2. No one wants to be in the business of maintaining two copies of any piece of code, which are copy-pastes of each other but subtly different. In practice, we must: e.g., unix has ByteString variants of all of its functions (done by copy-paste); text provides some core IO functionality (also done by copy-paste). But it is terrible and scales poorly: every downstream library that wants to support two string types (or more) now has to publish two copies of themselves, and any new string implementation has the unenviable task of reimplementing the world to make themselves useful.

Backpack solves these problems, by allowing you to parametrize over a signature rather than a concrete implementation of a string type, and instantiate such an indefinite library whenever you want. This solves both problems:

  1. Because you are allowed to instantiate an indefinite library whenever you want, we can eagerly instantiate a posix-indef using String and ship it as posix, keeping backwards compatibility with all packages which are Backpack ignorant.
  2. At the same time, if packages depend directly on posix-indef, they themselves are parametrizable over a string type. Entire library ecosystems can defer the choice of string type to the end user, which on a sufficiently new version of GHC offers an backwards compatible way of adding support for new string types to a library. (I don't want to say, support multiple string types, because this is not necessarily a virtue in-and-of-itself.)

To this end, I would like to propose a string theory, for the base of GHC Haskell: namely the core boot libraries that are distributed with GHC today. These packages will set the tone for the eventual Backpackification of the rest of the ecosystem.

But first, what is it that we are parametrizing over? A string is not so simple...

A digression on file paths (and OS strings)

File paths (FilePath) are an important form of String which aren't really Unicode strings at all. POSIX specifies that file paths can be arbitrary C strings, thus, code which decodes a file path as Unicode must be cognizant of the fact that the underlying ByteString could contain arbitrary, undecodable nonsense. To make matters worse, even the encoding can vary: on Windows file paths are encoded in UTF-16 (with unpaired surrogates, eek!), while in modern Linux environments the encoding is dictated by the locale (base uses locale_charset to determine how to interpret file paths; the locale is often UTF-8, but not always).

Thus, the definition type FilePath = String is very questionable indeed. There is an existing proposal, the Abstract FilePath Proposal to turn FilePath into an abstract type, and not just a type synonym for String. Unfortunately, a change like this is a BC-breaking one, so it will take some time to implement, since GHC must first be taught to warn when FilePath is used as if it were a String, to help people find out that they are using it incorrectly.

Backpack offers a more decentralized way to move into the future: just define an abstract signature for FilePath to depend upon. The low level signature might look like this:

signature FilePath where -- | File and directory names, whose precise -- meaning is operating system dependent. Files can be opened, yielding a -- handle which can then be used to operate on the contents of that file. data FilePath -- | A C string (pointer to an array of C characters terminated by NUL) -- representing a file path, suitable for use with the operating system -- C interface for file manipulation. This exact type is architecture -- dependent. type CFilePath = #ifdef mingw32_HOST_OS CWString #else CString #endif withFilePath :: FilePath -> (CFilePath -> IO a) -> IO a newFilePath :: FilePath -> IO CFilePath peekFilePath :: CFilePath -> IO FilePath -- peekFilePath >=> newFilePath should be identity -- (this is tricky to achieve if FilePath is a -- Unicode-based type, like String)

And of course, you would want all of the FilePath manipulation functions that people use.

To maintain compatibility with the existing ecosystem, you would likely instantiate your library with type FilePath = String. But there is nothing stopping you from picking your own abstract FilePath type and using it instead.

File paths are not unique in this sense; there are other strings (such as the values of environment variables) which have similar properties: I've taken to calling these OSStrings (as they are called in Rust.)

Axes of parametrization

With this in mind, there are three "string variants" any given library can be parametrized:

  1. They can be parametrized over FilePath, for modules which deal with the file system (e.g., System.Posix.Directory)
  2. They can be parametrized over an OSString, because they deal with various operating system specific APIs (e.g., System.Posix.Env)
  3. They can be parametrized over a String, because, well, sometimes a string is just a string. (e.g., Text.ParserCombinators.ReadP)

Some libraries may be parametrized in multiple ways: for example, readFile needs to be parametrized over both FilePath and String.

Split base (and friends) for Backpack

For technical reasons, Backpack cannot be used to parametrize specific modules; you have to parametrize over an entire library. So a side-effect of Backpack-ing the core libraries is that they will be split into a number of smaller libraries. Using module reexports, you can still keep the old libraries around as shims.

There are four GHC boot libraries which would most benefit from modularization on strings:

  • base
    • base-io (System.IO and submodules; parametrized over FilePath and String)
    • There are a few other modules which could be stringified, but the marginal benefit may not justify making a new package for each (Data.String, System.Console.GetOpt, Text.ParserCombinators.ReadP, Text.Printf). Each of these only needs to be parametrized over String.
    • Control.Exception, Text.Read and Text.Show are explicit non-goals, they are too deeply wired into GHC at present to muck about with.
  • unix
    • unix-env (System.Posix.Env, parametrized over OSString)
    • unix-fs (System.Posix.Directory, System.Posix.Files, System.Posix.Temp parametrized over FilePath)
    • unix-process (System.Posix.Process, parametrized over FilePath and OSString)
  • pretty (parametrized over String; then GHC could use it rather than roll its own copy!)
  • process (parametrized over String, OSString and FilePath)

The naming scheme I propose is that, e.g., the package unix continues to be the package instantiated with old-fashioned Strings. Then unix-indef is a package which is uninstantiated (the user can instantiate it to what they want, or pass on the decision to their users). Some packages may choose to also provide shims of their package instantiated with specific types, e.g., base-io-bytestring, which would be base-io instantiated with ByteString rather than String, though these names could get to be quite long, so it's uncertain how useful this would be.

Closing remarks

Of all the packages mentioned here, only base could make the bold step of using Backpack next GHC release (although it won't; at least, not for GHC 8.2); the rest need to maintain backwards compatibility with old versions of GHC and so would have to be forked to use Backpack.

The real test for Backpack will be whether or not string-using packages in the ecosystem decide to sign on, and parametrize themselves over signatures. I hope that eventually, you can use any library with ByteString or Text with the same ease that you can use libraries with String (and maybe even use your own, home-grown type.) The problem with module systems is that you rarely see the benefits until you use them for big systems, but that makes it difficult to evaluate them before hand. But the benefits seem tantalizing: let's boldly backpack forth to a brighter future!

Categories: Offsite Blogs

Darcs: darcs 2.12.1 release

Planet Haskell - Mon, 09/05/2016 - 2:17pm
The Darcs team is pleased to announce the release of Darcs 2.12.1!

The most important changes are:
  • fixes for building with GHC 8
  • drop support for GHC 7.6 and 7.8, i.e., require GHC 7.10
  • improvements in `darcs whatsnew` output with irrelevant files (Ben Franksen)
This release can be installed via cabal or (soon) stack. The 2.12 branch is also available as a darcs repository from http://darcs.net/releases/branch-2.12 .
Categories: Offsite Blogs

The GHC Team: Call for Nominations: GHC Steering Committee

Planet Haskell - Mon, 09/05/2016 - 12:56pm

Hello everyone,

As you likely know, over the last few months we have been discussing options for reforming the process for proposing language and compiler changes to GHC. After much discussion, we have a ​process which, while not perfect, is acceptable to a majority of our contributor base and will be an improvement over the status quo. While we invite suggestions for future improvements, we are at a point where we can move ahead with implementation.

Consequently, we are seeking nominations for the initial GHC steering committee. This body is responsible for overseeing the progression of proposals through the process, working with authors on refining their ideas, and evaluating proposals for acceptance. The committee will consist of five to eight members of diverse backgrounds.

We would like to offer the following as a criteria for membership. Note that a candidate is not expected to satisfy all or even most of these criteria, but a good candidate should satisfy at least one:

  • A history of contributions to the design of new language features
  • Experience developing Haskell libraries and applications
  • A demonstrated track record of contributing code to GHC
  • A pedagogical background, with experience in either teaching or authoring educational materials
  • Experience in compiler development
  • Knowledge of functional programming language theory

I would like to emphasize that committee membership is as much a duty as it is a privilege. Membership is not intended to be a platform to be used by members to drive their own ideas; rather it is a way of serving the Haskell community by helping other community members refine and advance their proposals. This, of course, requires an investment of time and effort, which you should be willing and able to consistently put forth.

If you would like to be considered for committee membership then please write a statement describing why you feel you are well-qualified to serve, in terms of the criteria above and any others that you would like to offer. Please send your statements to ben at well-typed.com by September 30th. The initial committee selection will be made by the Simons soon thereafter.

Thanks to everyone for their feedback and cooperation so far!

Cheers,

~ Ben

Categories: Offsite Blogs

Joachim Breitner: The new CIS-194

Planet Haskell - Mon, 09/05/2016 - 12:09pm

The Haskell minicourse at the University of Pennsylvania, also known as CIS-194, has always had a reach beyond the students of Penn. At least since Brent Yorgey gave the course in 2013, who wrote extensive lecture notes and eventually put the material on Github.

This year, it is my turn to give the course. I could not resist making some changes, at least to the first few weeks: Instead of starting with a locally installed compiler, doing execises that revolve mostly around arithmetic and lists, I send my students to CodeWorld, which is a web programming environment created by Chris Smith1.

This greatly lowers the initial hurlde of having to set up the local toolchain, and is inclusive towards those who have had little expose to the command line before. Not that I do not expect my students to handle that, but it does not hurt to move that towards later in the course.

But more importantly: CodeWorld comes with a nicely designed simple API to create vector graphics, to animate these graphics and even create interactive programs. This means that instead of having to come up with yet another set of exercieses revolving around lists and numbers, I can have the students create Haskell programs that are visual. I believe that this is more motivating and stimulating, and will nudge the students to spend more time programming and thus practicing.

<iframe height="400" src="https://code.world/run.html?mode=haskell&amp;dhash=D7Y0vPUj4D2tE2iKoYnvZrg" style="display: block; margin-left: auto; margin-right: auto" width="400"> </iframe>

In fact, the goal is that in their third homework assignemnt, the students will implement a fully functional, interactive Sokoban game. And all that before talking about the built-in lists or tuples, just with higher order functions and custom datatypes. (Click on the picture above, which is part of the second weeks’s homework. You can use the arrow keys to move the figure around and press the escape key to reset the game. Boxes cannot be moved yet -- that will be part of homework 3.)

If this sounds interesting to you, and you always wanted to learn Haskell from scratch, feel free to tag along. The lecture notes should be elaborate enough to learn from that, and with the homework problems, you should be able to tell whether you have solved it yourself. Just do not publish your solutions before the due date. Let me know if you have any comments about the course so far.

Eventually, I will move to local compilation, use of the interpreter and text-based IO and then start using more of the material of previous iterations of the course, which were held by Richard Eisenberg in 2014 and by Noam Zilberstein in 2015.

  1. Chris has been very helpful in making sure CodeWorld works in a way that suits my course, thanks for that!

Categories: Offsite Blogs

Ken T Takusagawa: [fltalwhq] integerLog2 and an introduction to unboxed types

Planet Haskell - Mon, 09/05/2016 - 1:18am

Some notes on using unboxed types in Haskell, and in particular, notes on creating a boxed wrapper for the integer-gmp library function integerLog2# :: Integer -> Int# which returns an unboxed type.

{-# LANGUAGE MagicHash #-}
import GHC.Exts(Int(I#));
import GHC.Integer.Logarithms(integerLog2#);

integerLog2 :: Integer -> Int;
integerLog2 i = if i < 1
then error "must be positive"
-- because integerLog2# does no bounds checking
else I# (integerLog2# i);

The pragma MagicHash prevents GHC from interpreting the hash symbol as an operator.  Without it, one gets error messages like this:

parse error on input `#'
not in scope: `#'

It would be nice if GHC emitted a suggestion of MagicHash on errors like this.

The constructor I# is findable using Hoogle, searching for Int#->Int.

One must use parentheses around the argument to I#.  The standard trick of removing parentheses with the dollar sign results in an error:

bad1 i = I# $ integerLog2# i;

Couldn't match kind `*' with `#'
When matching types
r0 :: *
GHC.Prim.Int# :: #

Using the composition operator in point-free style fails similarly:

bad2 = I# . integerLog2#;

Couldn't match kind `*' with `#'
When matching types
b0 :: *
GHC.Prim.Int# :: #
Expected type: b0 -> Int
Actual type: GHC.Prim.Int# -> Int
In the first argument of `(.)', namely `I#'

Unboxed types are a different "kind", the # kind, than boxed types, which are a * kind.

Categories: Offsite Blogs

Yesod Web Framework: Better CI for the Yesod scaffoldings

Planet Haskell - Mon, 09/05/2016 - 1:00am

After having completely forgotten to do this for a long time, I finally set aside some time last night to fix up the Travis CI for the Yesod scaffoldings. We have a number of different flavors of scaffoldings (e.g., PostgreSQL, MySQL, simple, and minimal), and keep all of the different flavors as branches on a single repo so that improvements to the base scaffolding can easily be merged into all of the others. The goal of my changes was to:

  • Have Travis check against the different snapshots likely to be selected by the stack new command
  • Automate testing against live databases, so that I can be lazy and not set up those databases for local testing

Overall, this went pretty smoothly, and also serves as a nice example of a short Stack-based Travis configuration. You can see the latest PostgreSQL Travis configuration.

Beyond my desire to be lazy in the scaffolding release process, the other obvious benefit is much more confidence when review PRs against the scaffolding that things will actually work.

Interesting discoveries

I discovered two things in this work that I hadn't realized previously:

  • Due to a dependency on a relatively recent yesod-auth version, only LTS Haskell 6 and up and Stackage Nightly are supported by the scaffolding. Therefore, for the build matrix, I haven't included LTS 5 and lower.
  • I was unaware of a bug in GHC 8.0.1 which prevents the scaffolding from compiling. This bug has already been resolved upstream and the fix will be included in GHC 8.0.2. In the meanwhile, I've blocked GHC 8.0.1 from usage in the scaffolding.

Upon reflection, this is probably a good thing: we are all but guaranteed that a new user will start off with LTS 6, which is a well tested set of packages and a very stable GHC version, leading to hopefully little user friction when getting started.

A user could in theory do something like stack new foo yesod-simple --resolver lts-5 --solver, but I think it's fair to assume that someone passing in such specific flags knows what he/she is doing and we should just stay out of his/her way.

Contributing

Now that CI is in better shape, this is a good time to remind everyone of how to contribute to the Yesod scaffoldings. The PostgreSQL scaffolding is considered the base, with the other flavors merging in changes from there. If you want to make a change, please submit a PR to the postgres branch of the yesod-scaffold repo. If you have a patch which is specific to one of the scaffoldings instead (like changing MySQL config settings), please submit it to that branch.

Note that it is not supported to send pull requests against the .hsfiles files in the stack-templates repo, as such changes can't be properly tested, and will be overwritten the next time a change from the yesod-scaffold repo is merged in.

Categories: Offsite Blogs

Edward Z. Yang: The Edit-Recompile Manager

Planet Haskell - Fri, 09/02/2016 - 6:40pm

A common claim I keep seeing repeated is that there are too many language-specific package managers, and that we should use a distribution's package manager instead. As an example, I opened the most recent HN discussion related to package managers, and sure enough the third comment was on this (very) dead horse. (But wait! There's more.) But it rarely feels like there is any forward progress on these threads. Why?

Here is my hypothesis: these two camps of people are talking past each other, because the term "package manager" has been overloaded to mean two things:

  1. For end-users, it denotes an install manager, primarily responsible for installing some useful software so that they can use it. Software here usually gets installed once, and then used for a long time.
  2. For developers, it denotes an edit-recompile manager: a piece of software for letting you take a software project under development and (re)build it, as quickly as possible. The installation of packages is a means, but it is not the end.

It should be clear that while these two use-cases have some shared mechanism, the priorities are overwhelmingly different:

  • End-users don't care about how a package is built, just that the things they want to install have been built. For developers, speed on rebuild is an overriding concern. To achieve this performance, a deep understanding of the structure of the programming language is needed.
  • End-users usually just want one version of any piece software. Developers use multiple versions, because that is the cost of doing business with a diverse, rapidly updated, decentralized package ecosystem.
  • End-users care about it "just working": thus, a distribution package manager emphasizes control over the full stack (usually requiring root.) Developers care about flexibility for the software they are rebuilding and don't mind if a little setup is needed.

So the next time someone says that there are too many language-specific package managers, mentally replace "package manager" with "edit-recompile manager". Does the complaint still make sense? Maybe it does, but not in the usual sense: what they may actually be advocating for is an interface between these two worlds. And that seems like a project that is both tractable and worth doing.

Categories: Offsite Blogs

Christopher Allen: The Hashrocket websocket shootout in Haskell

Planet Haskell - Fri, 09/02/2016 - 6:00pm

I recently PR’d a Haskell entry to Hashrocket’s websocket shootout. Haskell seemed to do a lot better than C++, Rust, Golang, Elixir, Erlang, NodeJS, Ruby MRI, and JRuby. Although the Haskell version has been since fixed, so I can no longer run the benchmark reliably on my machine, so any final results will have to come from Hashrocket running the unagi-chan variant.

How the benchmark works

The idea is to test how many concurrent clients a single websocket server (process?) can serve and how efficiently it can broadcast messages to all the clients.

The constraints of the benchmark are that your 95th percentile round-trip time cannot exceed 250ms. This is a better measurement/restriction for concurrency benchmarks than the usual “how many can it handle before it crashes” or throughput metrics, so props to Hashrocket on that point.

The client as-designed will increase the number of clients connected in the step-size specified and send test events at each step. If the 95th percentile round trip time exceeds 250ms, the benchmark client disconnects all client connections and halts. So, the last “line” of output you see from the client is essentially where you peaked before failing the SLA constraint.

What follows is the flawed Broadcast implementation I wrote that drops messages, so caveat lector

Everything below is retracted for now as Broadcast was dropping messages, which wasn’t explicitly permitted in the benchmark. I’m currently kicking around an unagi-chan based variant PR’d by Sebastian Graf, but I don’t think unagi-chan was designed for broadcasting across many thousands of channels.

For some context, this benchmark using Tsung is roughly what I expected in terms of results modulo hardware differences, which is why I wasn’t that surprised when I saw the initial results. Currently the Go websocket client seems to behave very differently from Tsung’s, so I don’t have a reproduction of what Tom Hunger’s benchmark did.

Before I retracted the results, I was peaking at 45,000 concurrent clients and a very low/flat latency with this version that uses Broadcast. However, Broadcast was dropping messages, so it’s not a valid comparison when the other benchmark servers weren’t dropping any messages. Incidentally, load-shedding is a great strategy for consistent server performance when it’s permissible ;)

Here’s the source to the Haskell version at time of writing:

{-# LANGUAGE DeriveGeneric #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE QuasiQuotes #-} module Main where import qualified Control.Concurrent as C import qualified Control.Concurrent.Broadcast as BC import Control.Lens hiding ((.=)) import Control.Monad (forever) import Data.Aeson import Data.Aeson.Lens import Data.Aeson.Types import Data.ByteString.Lazy (ByteString, toStrict) import qualified Data.Char as DC import Data.Functor (void) import Data.Text (Text) import Data.Text.Encoding (decodeUtf8) import GHC.Generics import Network.HTTP.Types (status400) import Network.Wai import Network.Wai.Handler.Warp import Network.Wai.Handler.WebSockets import Network.WebSockets import Text.RawString.QQ

The above is just the usual preamble/noise. I had quasiquotes for a test/example I didn’t use in the actual server.

type Broadcaster = BC.Broadcast ByteString

Hedging my bets in case I switched again after changing the broadcast type from Text to a lazy ByteString.

amendTest :: Maybe Value amendTest = decode $ [r| {"type":"broadcast","payload":{"foo": "bar"}} |] amendBroadcast :: Value -> Value amendBroadcast v = v & key "type" . _String .~ "broadcastResult"

Above was just test code.

broadcastThread :: Broadcaster -> Connection -> IO () broadcastThread bc conn = forever $ do t <- BC.listen bc sendTextData conn t

That’s all I do to relay broadcasted data to the listeners. Under the hood, Broadcast is:

MVar (Either [MVar a] a)

I used broadcast from concurrent-extra because I knew I wanted the propagation/thread wake to happen via the MVar machinery in the GHC runtime system.

wtf conn = sendTextData conn ("<img src=\"http://bit.ly/1kmRC7Q\" />" :: Text)

Error return method borrowed from ocharles.

mkPayload :: Text -> Value -> ByteString mkPayload type_ payload = encode $ object [ "type" .= String type_ , "payload" .= payload ]

Constructing a JSON value fitting the format expected by the test client and then encode-ing it into a ByteString.

bidiHandler :: Broadcaster -> Connection -> IO () bidiHandler bc conn = do _ <- C.forkIO (broadcastThread bc conn) -- [ 1 ] forever $ do -- [2] msg <- receiveDataMessage conn -- [3] case msg of Text t -> do let Just payload = t ^? key "payload" -- [ 4 ] case t ^? key "type" . _String of -- [ 5 ] Just "echo" -> sendTextData conn (mkPayload "echo" payload) -- [ 6 ] Just "broadcast" -> BC.signal bc (mkPayload "broadcastResult" payload) -- [ 7 ] _ -> wtf conn _ -> do wtf conn

I hate reading overly chopped-up code, so I annotated this one in the mode of the haskell book.

  1. We run the broadcast listener that relays data to the websocket client in a separate thread

  2. Running the client listener that (potentially) broadcasts data or just echoes back to the client in a Control.Monad.forever block.

  3. Block on receiving a data message (sum type, Text or Bytes)

  4. Pluck the payload value out of the JSON body because I’m too lazy to make a datatype for this.

  5. Get the event type out of the JSON body to dispatch on. We’re going to either echo or broadcast.

  6. If the event type was echo, kick the JSON data back to the client, but with the event type amended to echo.

  7. If the event type was broadcast, signal the broadcast handle to propagate the new JSON body with the payload and a broadcastResult event type.

wsApp :: Broadcaster -> ServerApp wsApp bc pending = do conn <- acceptRequest pending bidiHandler bc conn

Passing on the Broadcast handle and Connection to the handler.

main :: IO () main = do bc <- BC.new runServer "127.0.0.1" 3000 (wsApp bc)

Spawn a Broadcast, pass the handle on to wsApp, run it with the provided server from the wai-websockets library. That’s it.

Some thoughts

Erlang is the only runtime competitive on per-thread (process in their lingo) overhead, but they bite the dust on message send. MVar take/put pairing is ~25-40ns, you’re eating at least 1,000 ns in Erlang. It’s possible a custom Erlang implementation (Cowboy?) could do a better job here, but I’m not sure how to do broadcast especially efficiently in Erlang.

Asking how to efficiently broadcast to many Erlang processes on the mailing list gets you smarmy answers.

I was initially disappointed I didn’t get an excuse to optimize any of the Haskell code. It was limited only by the number of TCP connections I could bind, I had 2/3s of my 95th percentile RTT to burn yet. I messed with ulimit and the like a bit, but to really uncap it I’d need to change the client to connect to multiple IP addresses so I can use more TCP connections. Now I know it was because Broadcast was dropping messages and not tickling the slow parts as much as an implementation that forces broadcasts to all clients.

I know this site is a bit of a disaster zone, but if you like my writing or think you could learn something useful from me, please take a look at the book I've been writing with my coauthor Julie. There's a free sample available too!

Posted on September 3, 2016

Categories: Offsite Blogs

Brent Yorgey: Deep work and email habits

Planet Haskell - Fri, 09/02/2016 - 1:54pm

Lately I have been enjoying Cal Newport’s writing on work, and particularly his new book Deep Work which I am in the middle of reading (definitely recommended). His basic thesis is about the power of sustained, focused, distraction-free work on cognitively demanding tasks—what he calls deep work. It takes intentional effort to make the time and space for this kind of work, but Newport argues cogently that doing so can have enormous benefits.

Newport’s ideas have really resonated with me—I think I was already converging (albeit slowly, with little clarity) on similar ideas and practices over the last few years—and I’ve begun trying to put some of them more deliberately into practice. First, I have scheduled two large (4 hour) blocks of time for deep work each week. These blocks are sacrosanct: I won’t meet with students, schedule committee meetings, or do anything else during those times. I physically go somewhere other than my office—usually the library, occasionally my favorite coffee shop, somewhere relatively quiet and interruption-free where students and colleagues won’t find me. I first do as much as possible without turning on my laptop: course planning, reading, brainstorming, a lot of longhand writing (blog posts, papers, grant proposals, whatever—for example, I wrote this blog post itself longhand during my deep work session this morning). Sometimes if I need to write a longer, thoughtful email response, I will even print out the message beforehand and write my response longhand. Only towards the end of the session will I pull out my laptop, if I have specific projects to work on deeply that require a computer, like some sort of coding project.

Anecdotally at least, so far this feels incredibly successful—I get a lot done during these deep work sessions and always come away feeling accomplished and energized. The thing that feels especially good is that I’m not just getting a large amount of stuff done, but I’m getting important, difficult stuff done.

Another related practice I have recently adopted is that I do not read or write any email before 4pm. I have literally blocked myself from accessing email on my computers and phone between midnight and 4pm. Perhaps this sounds heretical, but that’s just the point—“because doing otherwise would be heresy” is a terrible reason for doing anything, and the fact is that not many of us really stop to consider and consciously choose the way we make use of technologies like email and social media. It’s taken some getting used to, but by now I don’t think I am ever going back. At 4pm I triage my inbox—respond to things that need a quick response, archive or delete stuff I don’t care about, and forward other things to my personal bug tracker for dealing with later. I am typically able to totally clear out my inbox before going home for the day. Over the course of the day I keep a list of emails I want to write later, and I write those at the same time that I triage my inbox, or sometimes later in the evening before going to bed. It feels way more efficient to batch most of my email processing into a focused session like this, and freeing to not be distracted by it the rest of the day. But do I ever miss it? Yes, all the time—and that’s exactly the point! Left to my natural tendencies I distract myself silly checking my email constantly.

Time will tell how much of this sticks and how my approach might change over time—I’ve scheduled a reminder for myself to write a followup post six months from now. As always, I’m happy to hear and respond to thoughts, reactions, questions, etc. in the comments.


Categories: Offsite Blogs

Dan Burton: GPG signing for github & mac

Planet Haskell - Fri, 09/02/2016 - 1:18pm
I just went through a few steps to get gpg signing to work on my mac and show up on github. I wanted to quickly document the process since the instructions are a little bit scattered. All of it basically came … Continue reading →
Categories: Offsite Blogs

Well-Typed.Com: Haskell eXchange, Hackathon, and Courses

Planet Haskell - Thu, 09/01/2016 - 7:04am

In October 2016, we are co-organizing various events in London. Now is the time to register for:

  • the Haskell eXchange, a two-day three-track conference with a large number of Haskell-related talks and workshops on a wide variety of topics, including keynotes by Simon Peyton Jones, Don Stewart, Conor McBride and Graham Hutton;

  • the Haskell eXchange Hackathon, a two-day event for everyone who wants to get involved coding on projects related to the Haskell infrastructure, such as Hackage and Cabal;

  • our Haskell courses, including a two-day introductory course, a one-day course on type-level programming in GHC, and a two-day course on lazy evaluation and performance.

Haskell eXchange

Thursday, October 5 – Friday, October 6

The Haskell eXchange is a general Haskell conference aimed at Haskell enthusiasts of all skill levels. The Haskell eXchange is organized annually, and 2016 is its fifth year. For the second year in a row, the venue will be Skills Matter’s CodeNode, where we have space for three parallel tracks. New this year: a large number of beginner-focused talks. At all times, at least one track will be available with a talk aimed at (relative) newcomers to Haskell. Of course, there are also plenty of talks on more advanced topics. The four keynote speakers are Simon Peyton Jones, Don Stewart, Conor McBride and Graham Hutton.

Registration is open; you can buy tickets via Skills Matter.

Haskell eXchange Hackathon

Saturday, October 7 – Sunday, October 8

We are going to repeat the successful Haskell Infrastructure Hackathon that we organized last year directly after the Haskell eXchange. Once again, everyone who is already contributing to Haskell projects related to the Haskell infrastructure as well as everyone who wants to get involved and talk to active contributors is invited to spend two days hacking on various projects, such as Hackage and Cabal.

Registration is open. This event is free to attend (and you can attend independently of the Haskell eXchange), but there is limited space, so you have to register.

Haskell courses Fast Track to Haskell

Monday, October 3 – Tuesday, October 4

This is a two-day general introduction to Haskell, aimed at developers who have experience with other (usually non-functional) programming languages, and want to learn about Haskell. Topics include defining basic datatypes and functions, the importance of type-driven design, abstraction via higher-order functions, handling effects (such as input/output) explicitly, and general programming patterns such as applicative functors and monads. This hands-on course includes several small exercises and programming assignments that allow to practice and feedback from the instructor during the course.

Registration is open; you can buy tickets via Skills Matter.

Guide to the Haskell Type System

Wednesday, October 5

This one-day course focuses on several of the type-system-oriented language extensions that GHC offers and shows how to put them to good use. Topics include the kind system and promoting datatypes, GADTs, type families, moving even more towards dependent types via the new TypeInType. The extensions will be explained, illustrated with exampls, and we provide advice on how and when to best use them.

Registration is open; you can buy tickets via Skills Matter.

Guide to Haskell Performance

Monday, October 9 – Tuesday, October 10

In this two-day course, we focus on how to write performant Haskell code that scales. We systematically explain how lazy evaluation works, and how one can reason about the time and space performance of code that is evaluated lazily. We look at various common pitfalls and explain them. We look at data structures and their performance characteristics and discuss their suitability for various tasks. We also discuss how one can best debug the performance of Haskell code, and look at existing high-performance Haskell libraries and their implementation to learn general techniques that can be reused.

Registration is open; you can buy tickets via Skills Matter.

Other courses and events

Well-Typed also offers on-demand on-site training and consulting. Please contact us if you are interested in consulting, or in events or courses that are not listed here.

We also have a low-volume mailing list where we occasionally announce events that we organize or participate in (subscribe here).

Categories: Offsite Blogs

Douglas M. Auclair (geophf): August 2016 1HaskellADay Problems and Solutions

Planet Haskell - Thu, 09/01/2016 - 6:57am
August 2016
  




  • August 25th, 2016: Today's #haskell exercise looks at historical prices of #bitcoinToday's #haskell solution is worth $180k ... five years ago. I wonder what it will be worth 5 years hence?  
  • August 23rd, 2016: Enough diving into the node's data, let's look at the structure of the related nodes for today's #haskell problem. The structure of tweets and related data for today's #haskell solution 
  • August 22nd, 2016: Today's #haskell problem is parsing twitter hashtags and a bit of data fingerprinting/exploration of same. BOOM! Today's #haskell solution analyzes hashtags twitter-users ('tweeps') use
  • August 19th, 2016: For today's #haskell exercise we look at unique users in a set of twitter graph-JSON. Today's #haskell solution gives us a list of users, then their tweets, from twitter graph-JSON data 
  • August 18th, 2016: For today's #haskell problem we extract and reify URLs from twitter graph-JSON. Today's #haskell solution extract URLs from twitter data as easily as looking up the URLs in a JSON map.
  • August 17th, 2016: For today's #haskell problem we explore the relationships from and to tweets and their related data. Today's #haskell solution relates data to tweets extracted from graph-JSON 
  • August 16th, 2016: For today's #haskell exercise we begin converting nodes in a graph to more specific types (Tweets are up first). We create some JSON Value-extractors and with those find the tweets in graph JSON in today's #Haskell solution 
  • August 15th, 2016: Today's #haskell exercise looks at twitter data as labeled/typed nodes and relations in JSON  
    Okay! For today's #haskell solution we discover our node and relation types in twitter data-as-graphs JSON! 
  • August 10th, 2016: Today's #Haskell problem we look at the big data-problem: getting a grasp of large indices of tweets in graph JSON. Today's #Haskell solution time-stamps and gives 'small-data' indices to tweets from graph JSON 
  • August 9th, 2016: For today's #haskell problem we extract the tweets from rows of graph data encoded in JSON. Today's #Haskell solution extracts the tweets from graph JSON and does some simple queries
  • August 8th, 2016: For today's #haskell problem we look at reading in the graph of a twitter-feed as JSON and just a bit of parsing. We leverage the Cypher library for today's #haskell solution to look at 100 rows of tweets encoded as JSON 
  • August 5th, 2016: Today's #Haskell problem we go for the Big Kahuna: solving a Kakuro puzzleOkay, we have a #Haskell solution ... finally ... maybe. The solver took too long, so I solved it myself faster :/ 
  • August 4th, 2016: Today's #Haskell exercise looks at (simple) constraints of unknown values for a sum-solver. Today's #Haskell solution also uses QBits to solve constrained unknowns 
  • August 3rd, 2016: Today's #haskell problem provides the cheatsheet: "What are the unique 4-number sums to 27?" We round-trip the Set category for today's #haskell solution
  • August 2nd, 2016: Today's #haskell exercise looks at solving our sums when we know some of the numbers alreadyQBits actually work nicely for today's #Haskell solution 
  • August 1st, 2016: For today's #Haskell exercise we play the 'Numbers Game.' The #haskell solution is a guarded combine >>= permute in the [Int]-domain. I like the Kleisli category; ICYMI.
  • Categories: Offsite Blogs

    Edward Z. Yang: Backpack and separate compilation

    Planet Haskell - Thu, 09/01/2016 - 12:26am

    When building a module system which supports parametrizing code over multiple implementations (i.e., functors), you run into an important implementation question: how do you compile said parametric code? In existing language implementations are three major schools of thought:

    1. The separate compilation school says that you should compile your functors independently of their implementations. This school values compilation time over performance: once a functor is built, you can freely swap out implementations of its parameters without needing to rebuild the functor, leading to fast compile times. Pre-Flambda OCaml works this way. The downside is that it's not possible to optimize the functor body based on implementation knowledge (unless, perhaps, you have a just-in-time compiler handy).
    2. The specialize at use school says, well, you can get performance by inlining functors at their use-sites, where the implementations are known. If the functor body is not too large, you can transparently get good performance benefits without needing to change the architecture of your compiler in major ways. Post-FLambda OCaml and C++ templates in the Borland model both work this way. The downside is that the code must be re-optimized at each use site, and there may end up being substantial duplication of code (this can be reduced at link time)
    3. The repository of specializations school says that it's dumb to keep recompiling the instantiations: instead, the compiled code for each instantiation should be cached somewhere globally; the next time the same instance is needed, it should be reused. C++ templates in the Cfront model and Backpack work this way.

    The repository perspective sounds nice, until you realize that it requires major architectural changes to the way your compiler works: most compilers don't try to write intermediate results into some shared cache, and adding support for this can be quite complex and error-prone.

    Backpack sidesteps the issue by offloading the work of caching instantiations to the package manager, which does know how to cache intermediate products. The trade off is that Backpack is not as integrated into Haskell itself as some might like (it's extremely not first-class.)

    Categories: Offsite Blogs

    Michael Snoyman: Using AppVeyor for Haskell+Windows CI

    Planet Haskell - Tue, 08/30/2016 - 6:00pm

    I don't think I ever documented this before, so just a quick post to get this out there. Many of us working on open source Haskell libraries already use Travis CI for doing continuous integration builds of our software. Some time ago they added support for OS X, making it possible to cover Linux and OS X with multiple configurations on their systems. For any project with a stack.yaml file, this can be easily achieved using the Stack recommended Travis configuration.

    Unfortunately, this leaves Windows testing out, which is unfortunate, because Windows is likely to be the most common build to fail. Fortunately, AppVeyor provides a similar experience to Travis, but for Windows. In order to get set up, just:

    1. Sign in with their web interface and add your Github repo
    2. Add an appveyor.yaml to your project

    Here's a simple file I've used on a few projects with succeess:

    build: off before_test: - curl -sS -ostack.zip -L --insecure http://www.stackage.org/stack/windows-i386 - 7z x stack.zip stack.exe clone_folder: "c:\\stack" environment: global: STACK_ROOT: "c:\\sr" test_script: - stack setup > nul # The ugly echo "" hack is to avoid complaints about 0 being an invalid file # descriptor - echo "" | stack --no-terminal test

    All this does is:

    • Downloads the Stack zip file
    • Unpacks the stack.exe executable
    • Changes the STACK_ROOT to deal with Windows long path issues
    • Run stack setup to get a toolchain
    • Run stack --no-terminal test to build your package and run the test suites

    You're free to modify this in any way you want, e.g., add in --bench to build benchmarks, add --pedantic to fail on warnings, etc. If you have more system library dependencies, you'll need to consult the AppVeyor docs to see how to install them. And in our use cases for Stack, we found that using the AppVeyor caching functionality made builds unreliable (due to the large size of the cache). You may want to experiment with turning it back on, since this setup is slow (it downloads and installs a full GHC toolchain and builds all library dependencies each time).

    Categories: Offsite Blogs

    Joachim Breitner: Explicit vertical alignment in Haskell

    Planet Haskell - Tue, 08/30/2016 - 7:35am

    Chris Done’s automatic Haskell formatter hindent is released in a new version, and getting quite a bit of deserved attention. He is polling the Haskell programmers on whether two or four spaces are the right indentation. But that is just cosmetics…

    I am in principle very much in favor of automatic formatting, and I hope that a tool like hindent will eventually be better at formatting code than a human.

    But it currently is not there yet. Code is literature meant to be read, and good code goes at length to be easily readable, and formatting can carry semantic information.

    The Haskell syntax was (at least I get that impression) designed to allow the authors to write nicely looking, easy to understand code. One important tool here is vertical alignment of corresponding concepts on different lines. Compare

    maze :: Integer -> Integer -> Integer maze x y | abs x > 4 || abs y > 4 = 0 | abs x == 4 || abs y == 4 = 1 | x == 2 && y <= 0 = 1 | x == 3 && y <= 0 = 3 | x >= -2 && y == 0 = 4 | otherwise = 2

    with

    maze :: Integer -> Integer -> Integer maze x y | abs x > 4 || abs y > 4 = 0 | abs x == 4 || abs y == 4 = 1 | x == 2 && y <= 0 = 1 | x == 3 && y <= 0 = 3 | x >= -2 && y == 0 = 4 | otherwise = 2

    The former is a quick to grasp specification, the latter (the output of hindent at the moment) is a desert of numbers and operators.

    I see two ways forward:

    • Tools like hindent get improved to the point that they are able to detect such patterns, and indent it properly (which would be great, but very tricky, and probably never complete) or
    • We give the user a way to indicate intentional alignment in a non-obtrusive way that gets detected and preserved by the tool.

    What could such ways be?

    • For guards, it could simply detect that within one function definitions, there are multiple | on the same column, and keep them aligned.
    • More general, one could take the approach lhs2Tex (which, IMHO, with careful input, a proportional font and with the great polytable LaTeX backend, produces the most pleasing code listings) takes. There, two spaces or more indicate an alignment point, and if two such alignment points are in the same column, their alignment is preserved – even if there are lines in between!

      With the latter approach, the code up there would be written

      maze :: Integer -> Integer -> Integer maze x y | abs x > 4 || abs y > 4 = 0 | abs x == 4 || abs y == 4 = 1 | x == 2 && y <= 0 = 1 | x == 3 && y <= 0 = 3 | x >= -2 && y == 0 = 4 | otherwise = 2

      And now the intended alignment is explicit.

    (This post is cross-posted on reddit.)

    Update (2016-09-05) Shortly after this post, the Haskell formatter brittany gets released, which supports vertial alignment. Yay!

    Categories: Offsite Blogs