News aggregator

A new yaml module

haskell-cafe - Mon, 02/03/2014 - 10:57am
The yaml package[1] currently provides two modules. Text.Libyaml is a lower level, streaming API which is essentially a raw exposure of the underlying libyaml C library. Data.Yaml uses the aeson Value data type and ToJSON/FromJSON type classes for higher level serialization. For many cases, this approach works well, though there are problems: * There are problems with roundtripping, since YAML allows for ambiguity about the data type of values[2]. For example, in the yaml snippet `foo: 1234`, is 1234 intended to be numeric or a string? Either is valid. * YAML is intended to be human-readable output. But Data.Yaml provides no control over the encoded representation, e.g. should we use single or double quotes for a string (or no quotes at all), or the order of values in a mapping[3]. For other examples, just look at the issue tracker for yaml[4]. I don't want to drop the current aeson-based functionality, since I think it's still valid and useful in many cases. But I do think it's worthwhile to add in an al
Categories: Offsite Discussion

Brandon Simmons: Thoughts on FIFO-ness

Planet Haskell - Mon, 02/03/2014 - 10:28am

I’ve been doing a lot of experimenting with concurrent operations in haskell and in particular playing with and thinking about the design of concurrent FIFO queues. These structures are difficult to make both efficient and correct, due to the effects of contention on the parts of the structure tasked with coordinating reads and writes from multiple threads.

These are my thoughts so far on FIFO semantics.

FIFO? And how!

In the interesting paper “How FIFO is your concurrent FIFO queue?”(PDF). A Haas, et al. propose that an ideal FIFO queue has operations that are instantaneous (think of each write having an infinitely accurate timestamp, and each read taking the corresponding element in timestamp order). They then measure the degree to which real queues of various designs deviate from this platonic FIFO semantics in their message ordering, using a metric they call “element-fairness”. They experimentally measure element-fairness of both so-called “strict FIFO” as well as “relaxed FIFO” designs, in which elements are read in more or less the order they were written (some providing guarantees of degree of re-ordering, others not).

The first interesting observation they make is that no queue actually exhibits FIFO semantics by their metric; this is because of the realities of the way atomic memory operations like CAS may arbitrarily reorder a set of contentious writes.

The second interesting result is that the efficient-but-relaxed-FIFO queues which avoid contention by making fewer guarantees about message ordering often perform closer to ideal FIFO semantics (by their metric) than the “strict” but slower queues!

Observable FIFO Semantics

As an outsider, reading papers on FIFO queue designs I get the impression that what authors mean by “the usual FIFO semantics” is often ill-defined. Clearly they don’t mean the platonic zero-time semantics of the “How FIFO… ” paper, since they can’t be called FIFO by that measure.

I suspect what makes a queue “strict FIFO” (by the paper’s categorization) might simply be

If write x returns at time T, then x will be read before the elements of any writes that have not yet started by time T.

The idea is difficult to express, but is essentially that FIFO semantics is only observable by way of actions taken by a thread after returning from a write (think: thread A writes x, then tells B which writes y, where our program’s correctness depends on the queue returning y after x). Note that since a queue starts empty this is also sufficient to ensure writes don’t “jump ahead” of writes already in the queue.

Imagine an absurd queue whose write never returns; there’s very little one can say for certain about the “correct” FIFO ordering of writes in that case, especially when designing a program with a preempting scheduler that’s meant to be portable. Indeed the correctness criterion above is actually probably a lot stricter than many programs require; e.g. when there is no coordination between writers, an observably-FIFO queue need only ensure that no reader thread sees two messages from the same writer thread out of order (I think).

The platonic zero-time FIFO ordering criterion used in the paper is quite different from this observable, correctness-preserving FIFO criterion; I can imagine it being useful for people designing “realtime” software.

Conclusion

At a certain level of abstraction, correct observable FIFO semantics shouldn’t be hard to make efficient; after all, the moments during which we have contention (and horrible performance) are also the moments during which we don’t care about (or have no way of observing) correct ordering. In other words (although we have to be careful of the details) a thread-coordination scheme that breaks down (w/r/t element-fairness) under contention isn’t necessarily a problem. Compare-and-swap does just that, unfortunately it breaks down in a way that is slower rather than faster.

Categories: Offsite Blogs

ANNOUNCE: fpnla - A library for NLA operations

General haskell list - Mon, 02/03/2014 - 3:51am
Hi. Today we are releasing fpnla <http://hackage.haskell.org/package/fpnla>, which defines a framework for linear algebra operations of BLAS and LAPACK. As its main features it allows: - Definition of multiple representations of vectors and matrices. - Arbitrary combination of strategies and structure representations. - Type-safe manipulation of context information associated to each strategy. - Definition of specialized strategies for a given representation. And also fpnla-examples <http://hackage.haskell.org/package/fpnla-examples>, which contains many example implementations of the operations defined in fpnla using various data structures, algorithms and parallelism libraries. Regards.
Categories: Incoming News

ANNOUNCE: fpnla - A library for NLA operations

haskell-cafe - Mon, 02/03/2014 - 1:55am
Hi. Today we are releasing fpnla <http://hackage.haskell.org/package/fpnla>, which defines a framework for linear algebra operations of BLAS and LAPACK. As its main features it allows: - Definition of multiple representations of vectors and matrices. - Arbitrary combination of strategies and structure representations. - Type-safe manipulation of context information associated to each strategy. - Definition of specialized strategies for a given representation. And also fpnla-examples <http://hackage.haskell.org/package/fpnla-examples>, which contains many example implementations of the operations defined in fpnla using various data structures and algorithms. Regards.
Categories: Offsite Discussion

Yesod Web Framework: Improving the performance of Warp again

Planet Haskell - Mon, 02/03/2014 - 12:41am
Improving the performance of Warp again

As you may remember, I improved the performance of Warp in 2012 and wrote some blog articles on this web site. Based on these articles, Michael and I wrote an article "Warp" for POSA (Performance of Open Source Applications).

After working with Andreas last year, I came up with some new ideas to yet again improve Warp's performance, and have implemented them in Warp 2.0. In this article, I will explain these new techniques. If you have not read the POSA article, I recommend to give a look at it beforehand.

Here are the high level ideas to be covered:

  • Better thread scheduling
  • Buffer allocation to receive HTTP requests
  • Buffer allocation to send HTTP responses
  • HTTP request parser
witty

When I was testing multicore IO manager with Mighty (on Warp), I noticed bottlenecks of network programs complied by GHC. To show such bottlenecks actually exist, I wrote a server program called witty based on Andreas's "SimpleServer".

"witty" provides seven command line options to switch standard methods to alternatives. My experiment with "witty" disclosed the following bottlenecks:

  • Thread scheduling (the "-y" option)
  • Network.Socket.ByteString.recv (the "-r" option)
Better thread scheduling

GHC's I/O functions are optimistic. For instance, recv tries to read incoming data immediately. If there is data available, recv succeeds. Otherwise, EAGAIN is returned. In this case, recv asks the IO manager to notify it when data is available.

If a network server repeats receive/send actions, recv just after send will usually fail because there is a time lag for the next request from the client. Thus the IO manager is often required to do notification work. Here is a log of "strace":

recvfrom(13, ) -- Haskell thread A sendto(13, ) -- Haskell thread A recvfrom(13, ) = -1 EAGAIN -- Haskell thread A epoll_ctl(3, ) -- Haskell thread A (a job for the IO manager) recvfrom(14, ) -- Haskell thread B sendto(14, ) -- Haskell thread B recvfrom(14, ) = -1 EAGAIN -- Haskell thread B epoll_ctl(3, ) -- Haskell thread B (a job for the IO manager)

To achieve better scheduling, we need to call yield after send. yield pushes its Haskell thread onto the end of thread queue. In the meanwhile, other threads are able to work. During the work of other threads, a new request message can arrive, and therefore the next call to recv will succeed. With the yield hack, a log of "strace" becames as follows:

recvfrom(13, ) -- Haskell thread A sendto(13, ) -- Haskell thread A recvfrom(14, ) -- Haskell thread B sendto(14, ) -- Haskell thread B recvfrom(13, ) -- Haskell thread A sendto(13, ) -- Haskell thread A

In other words, yield makes the IO manager work less frequently. This magically improves throughput! This means that even multicore IO manager still has unignorable overhead. It uses MVar to notify data availability to Haskell threads. Since MVar is a lock, it may be slow. Or perhaps, allocation of the MVar is slow.

Unfortunately when I first added yield to Warp, no performance improvement were gained. It seems to me that Monad stack (ResourceT) handcuffs the yield hack. So, Michael removed ResourceT from WAI. This is why the type of Application change from:

type Application = Request -> ResourceT IO Response

to:

type Application = Request -> IO Response

This change itself improved the performance and enables the yield hack resulting in drastic performance improvement of Warp at least on small numbers of cores.

Michael has added an explanation of the removal of ResourceT to the end of this post.

Buffer allocation to receive HTTP requests

Warp used Network.Socket.ByteString.recv to receive HTTP requests. It appeared that this function has significant overhead. Let's dig its definition deeply:

recv :: Socket -> Int -> IO ByteString recv sock nbytes = createAndTrim nbytes $ recvInner sock nbytes

As you can see, recv calls createAndTrim to create ByteString. Its definition is:

createAndTrim :: Int -> (Ptr Word8 -> IO Int) -> IO ByteString createAndTrim l f = do fp <- mallocByteString l withForeignPtr fp $ \p -> do l' <- f p if assert (l' <= l) $ l' >= l then return $! PS fp 0 l else create l' $ \p' -> memcpy p' p l'

Suppose we specify 4,096 as a buffer size to recv. First a ByteString of 4,096 bytes is created by mallocByteString. If the size of a received request is not 4,096, another ByteString is created by create and memcpy. create also calls mallocByteString as follows:

create :: Int -> (Ptr Word8 -> IO ()) -> IO ByteString create l f = do fp <- mallocByteString l withForeignPtr fp $ \p -> f p return $! PS fp 0 l

So, let's understand what happens when mallocByteString is called. Its definition is as follows:

mallocByteString :: Int -> IO (ForeignPtr a) mallocByteString = mallocPlainForeignPtrBytes

GHC.GHC.ForeignPtr provides mallocPlainForeignPtrBytes:

mallocPlainForeignPtrBytes :: Int -> IO (ForeignPtr a) mallocPlainForeignPtrBytes (I# size) = IO $ \s -> case newPinnedByteArray# size s of { (# s', mbarr# #) -> (# s', ForeignPtr (byteArrayContents# (unsafeCoerce# mbarr#)) (PlainPtr mbarr#) #) }

As you can see, mallocPlainForeignPtrBytes calls newPinnedByteArray# which allocates a pinned object. Pinned objects are not moved by GHC's copy GC. That's why they are called pinned.

Substance of newPinnedByteArray# is rts/PrimOps.cmm:stg_newPinnedByteArrayzh which calls allocatePinned. allocatePinned is implemented in C in the file of "rts/sm/Storage.c":

StgPtr allocatePinned (Capability *cap, W_ n) { StgPtr p; bdescr *bd; // If the request is for a large object, then allocate() // will give us a pinned object anyway. if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) { p = allocate(cap, n); Bdescr(p)->flags |= BF_PINNED; return p; } ...

allocatePinned calls allocate if "n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)" is met. Here is a part of the definition of allocate:

StgPtr allocate (Capability *cap, W_ n) { ... ACQUIRE_SM_LOCK bd = allocGroup(req_blocks); dbl_link_onto(bd, &g0->large_objects); g0->n_large_blocks += bd->blocks; // might be larger than req_blocks g0->n_new_large_words += n; RELEASE_SM_LOCK; ...

So, allocate uses a global lock. To my calculation, LARGE_OBJECT_THRESHOLD/sizeof(W_) is:

  • 3,276 bytes (819 words) on 32 bit machines
  • 3,272 bytes (409 words) on 64 bit machines

Caution: the numbers above were changed according to a comment from Simon Marlow.

Since old Warp specified 4,096 to recv, a global lock is acquired for every HTTP request. If the size of an HTTP request is some between 3,272(3,276) and 4,095, another global lock is obtained.

To avoid contention, I modified Warp so that a buffer of 4,096 bytes is allocated by malloc() for every HTTP connection. Sophisticated malloc() implementations have arena to avoid global contention. Also, since we repeat malloc() and free() for the same size, we can take advantage of the free list in malloc().

The buffer is passed to the recv() system call. After an HTTP request is received, a ByteString is allocated by mallocByteString and data is copied by memcpy. This tuning also improved the throughput of Warp drastically.

Buffer allocation to send HTTP response

Michael and I noticed that the buffer for receiving can also be used for sending. Let's recall that Response has three constructors:

data Response = ResponseFile H.Status H.ResponseHeaders FilePath (Maybe FilePart) | ResponseBuilder H.Status H.ResponseHeaders Builder | ResponseSource H.Status H.ResponseHeaders (forall b. WithSource IO (C.Flush Builder) b)

We changed that the buffer is also used for ResponseBuilder and ResponseSource to avoid extra buffer allocations. (In the case of ResponseFile, the zero copy system call, sendfile(), ensures no extra buffer is allocated.)

HTTP request parser

At this stage, I took profiles of Mighty. Here is a result:

sendfileloop Network.Sendfile.Linux 7.5 0.0 parseRequestLine Network.Wai.Handler.Warp.RequestHeader 3.6 5.8 sendloop Network.Sendfile.Linux 3.6 0.0 serveConnection.recvSendLoop Network.Wai.Handler.Warp.Run 3.1 1.9 >>= Data.Conduit.Internal 2.9 3.9

I persuaded my self that I/O functions are slow. But I could not be satisfied with the poor performance of the HTTP request parser. parseRequestLine was implemented by using the utility functions of ByteString. Since they have overhead, I re-wrote it with Ptr-related functions. After writing the low-level parser, the profiling became:

sendfileloop Network.Sendfile.Linux 8.3 0.0 sendResponse Network.Wai.Handler.Warp.Response 3.7 3.1 sendloop Network.Sendfile.Linux 3.2 0.0 >>= Data.Conduit.Internal 2.9 4.0 serveConnection.recvSendLoop Network.Wai.Handler.Warp.Run 2.6 2.0

I was happy because parseRequestLine disappeared from here. One homework for me is to understand why sendfileloop is so slow. Probably I need to check if locks are used in sendfile(). If you have any ideas, please let me know.

Performance improvement

So, how fast Warp became actually? I show a chart to compare throughput among Mighty 2 complied by GHC 7.6.3, Mighty 3 compiled by coming GHC 7.8, and nginx 1.4.0. Note that only one core is used. I have two reasons for this: 1) since the change of data center of our company, I cannot use the environment described in the POSA article at this moment. So, I need to draw this chart based on my old memo. 2) nginx does not scale at all in my environment even if the deep sleep mode is disabled.

Anyway, here is the result measured by weighttp -n 100000 -c 1000 -k:

Removing ResourceT from WAI

Before we can understand how we removed ResourceT from WAI, we need to understand its purpose. The goal is to allow an Application to acquire a scarce resource, such as a database connection. But let's have a closer look at the three response constructors used in WAI (in the 1.4 series):

data Response = ResponseFile H.Status H.ResponseHeaders FilePath (Maybe FilePart) | ResponseBuilder H.Status H.ResponseHeaders Builder | ResponseSource H.Status H.ResponseHeaders (C.Source (C.ResourceT IO) (C.Flush Builder))

Both ResponseFile and ResponseBuilder return pure values which are unable to take advantage of the ResourceT environment to allocate resources. In other words, the only constructor which takes advantage of ResourceT is ResponseSource. Our goal, therefore, was to limit the effect of resource allocation to just that constructor, and even better to make the resource environment overhead optional in that case.

The first step is what Kazu already mentioned above: changing Application to live in IO instead of ResourceT IO. We need to make the same change to the ResponseSource constructor. But now we're left with a question: how would we acquire a resource inside a streaming response in an exception safe manner?

Let's remember that an alternative to ResourceT is to use the bracket pattern. And let's look at the structure of how a streaming response is processed in Warp. (A similar process applies to CGI and other backends.)

  1. Parse the incoming request, generated a Request value.
  2. Pass the Request value to the application, getting a Response value.
  3. Run the Source provided by the application to get individual chunks from the Response and send them to the client.

In order to get exception safety, we need to account for three different issues:

  • Catch any exceptions thrown in the application itself while generating the Response. This can already be handled by the application, since it simply lives in the IO monad.

  • Mask asynchronous exceptions between steps 2 and 3. This second point required a minor change to Warp to add async masking.

  • Allow the application to install an exception handler around step 3.

That third point is the bit that required a change in WAI. The idea is to emulate the same kind of bracket API provided by functions like withFile. Let's consider the signature for that function:

withFile :: FilePath -> IOMode -> (Handle -> IO a) -> IO a -- or partially applied withSomeFile "foo" ReadMode :: (Handle -> IO a) -> IO a

In our case, we don't want to get a Handle, but instead a Source of bytes. Let's make a type signature or two to represent this idea:

type ByteSource = Source IO (Flush Builder) type WithByteSource a = (ByteSource -> IO a) -> IO a

And the resulting ResponseSource constructor is:

ResponseSource H.Status H.ResponseHeaders (forall b. WithByteSource b)

To deal with the common case of installing a cleanup function, WAI provides the responseSourceBracket function. At a higher level, Yesod is able to build on top of this to provide the same ResourceT functionality we had previously, so that a Yesod user can simply use allocate and friends and get full exception safety.

Acknowledgment

Michael and I thank Joey Hess for getting Warp to work well on Windows and Gregory Collins for his discussions on performance.

Categories: Offsite Blogs

problem with happy and type identity

haskell-cafe - Sun, 02/02/2014 - 11:32pm
I'm working on a small project that involves an Alex scanner and a Happy parser, and I'm getting an error from the type-checker that I do not understand. Can anyone help shed some light on what's going on? I'm running Haskell Platform 2013.2.0.0, on MacOS 10.8.5 with XCode 4.6.3. I've reduced the problem to a very small example, which I've attached as a tar file. It's a cabal package, and it contains a library with some test cases. If I run "cabal configure && cabal build" then the library builds with no problems whatsoever. But if I run cabal clean && cabal configure --enable-tests && cabal build then I get the following error message: RunTests.hs:16:27: Couldn't match expected type `sample-0.4:Ast.Entry' with actual type `Entry' In the return type of a call of `Entry' In the second argument of `(~?=)', namely `Entry "mumble"' In the expression: parse "entry" ~?= Entry "mumble" And this doesn't make any sense to me, because the two types
Categories: Offsite Discussion

Beginning Haskell

del.icio.us/haskell - Sun, 02/02/2014 - 10:04pm
Categories: Offsite Blogs

New book: "Beginning Haskell"

haskell-cafe - Sun, 02/02/2014 - 6:49pm
[Disclaimer: I'm the author of the book] Dear Haskell-cafe, I would like to introduce a new book discussing our favourite language from beginner into upper intermediate level: "Beginning Haskell". The book starts assuming zero knowledge about functional programming and builds step by step to get into the realm of web applications, type-level programming, domain specific languages, distributed computing, unit testing and much more! You can look at the Table of Contents in Amazon [ http://www.amazon.com/Beginning-Haskell-A-Project-Based-Approach/dp/1430262508/ ]. The book revolves around the idea of a Time Machine Store: in each part some functionality is developed. The first part serves as an introduction and how to model the data with Haskell data types and functions; part 2 discusses many concepts around monads while developing two data-mining algorithms; in the third part storing and interfacing with clients serves as an excuse to introduce input/output, database access and web applications; part 4 is d
Categories: Offsite Discussion

chromaticleaves.com

del.icio.us/haskell - Sun, 02/02/2014 - 6:45pm
Categories: Offsite Blogs

Why I prefer Scheme to Haskell

del.icio.us/haskell - Sun, 02/02/2014 - 3:12pm
Categories: Offsite Blogs

Why I prefer Scheme to Haskell

del.icio.us/haskell - Sun, 02/02/2014 - 3:12pm
Categories: Offsite Blogs

Robin KAY: HsQML 0.2.0.3 released

Planet Haskell - Sun, 02/02/2014 - 12:46pm
Yesterday, I made new a minor release of HsQML in order to address two issues with using it interactively via GHCi. As usual, it's available for download from Hackage. One little mistake did slip in however, in that I forget to change the darcs repository listed in the package cabal file to the Qt 4 branch. The main trunk is now being used for porting to Qt 5.

An explanation of the GHCi problems follows:

GHCi has traditionally had a number of limitations owing to the built-in linker it uses to load static object files dynamically. The linker is capable enough to load the output of GHC and any simple FFI C code that might be included in a library, but it can't cope with some of the relocations emitted by a C++ compiler. Originally, it wasn't even capable of reading the same archive libraries used by the GHC compiler for linking, and required that Cabal produce special compounded object files for it to use.

The C++ limitation was an issue for HsQML because Qt is a C++ library and hence HsQML needs to include some C++ code as part of its binding layer. I made use of the fact that GHCi depended on special object files in order to incorporate a workaround especially for GHCi. HsQML's build script modifies the build process by removing the objects containing C++ code from being compounded into the special object file, and places them into a separate shared library which is then referenced by the package's extra-ghci-libraries field. GHCi will hence load the shared library and the compiled C++ code within using the system linker, thereby avoiding the problems with its own.

However, it came to my attention recently* than this strategy had run into trouble as GHCi can now load regular archive libraries directly, supplanting the need for special object files. I discovered that the Fedora Linux had modified their distribution of GHC to disable generating the GHCi objects by default. Furthermore, that this behaviour would become the new standard default with Cabal 1.18. This broke HsQML with GHCi because because the aforementioned workaround didn't apply to the regular archive libraries and so GHCi's linker couldn't handle the C++ object files contained within.

I didn't want to simply apply the same workaround to the archive libraries as to the GHCi ones because that would introduce dealing with an additional magic shared library to users who simply wanted to compile their applications. The modification I've applied for this release was therefore to add code to Setup.hs to force (re-)enable generating the special GHCi object files under certain circumstances.

The impact of this issue is likely to decrease over time as GHC now also supports producing shared libraries from Haskell code in addition to static ones. This means that, going forward, the entirety of HsQML can be built as a shared library and GHCi can load it using the system linked without difficulty. My understanding is that this behaviour will become the default with GHC 7.8 for platforms other than Windows.

Hence, the rule is that generating GHCi object files is only force enabled if shared libraries are not enabled. The forcing behaviour can be disabled by passing -f-ForceGHCiLib to cabal-install.

The other issue I found that's fixed with this release is that GHCi had problems finding the workaround shared library on Windows. Unlike other platforms, the extra-ghci-libraries field needed to include the "lib" prefix to the referenced library name in order for Windows GHCi to find it without the library being on the PATH. With that fixed, HsQML should now work with GHCi out of the box on all platforms.

Now, back to working on the Qt 5 port!

release-0.2.0.3 - 2014.02.01

* Added mechanism to force enable GHCi workaround library.
* Fixed reference name of extra GHCi library on Windows.
* Thanks to rnons.
Categories: Offsite Blogs

Robin KAY: HsQML 0.2.0.2 released

Planet Haskell - Sun, 02/02/2014 - 12:42pm
I've been meaning start blogging again for ages, and this seemed to be as good a time as any. I've just released HsQML 0.2.0.2 and uploaded it to Hackage. This minor point release fixes two issues:

Firstly, API changes in Cabal 1.18 broke HsQML's rather involved Setup.hs. I didn't want mandate that users have the latest Cabal library, so I investigated a couple of different options for supporting both the old and new. There are no really pretty solutions here and I ended up using Template Haskell to select between two different sets of utility functions.

Secondly, the text package has recently gone through two major version changes under the PVP. I've widened the Cabal version constraint to permit both text-1.0 and text-1.1.

Hope you enjoy!

release-0.2.0.2 - 2014.01.18

* Added support for Cabal 1.18 API.
* Relaxed Cabal dependency constraint on 'text'.
Categories: Offsite Blogs