Saturday night I had a fainting spell. Sunday my eyes were burning, I was feverish, weak, and had the beginnings of a migraine. Monday was completely lost in the blaze of a migraine. Tuesday I was starting to feel better— and then, nope; threw up that night. Wednesday I awoke with what felt like a raging sinus infection; spent the whole day in a haze of sudafed and ibuprofen, and went through literally an entire box of tissues.
Starting to feel a little better this morning, so I figure this is the end. It was a nice life. Y'might want to bar your doors today, just in case it's locusts.
Junfeng Yang, Heming Cui, Jingyue Wu, Yang Tang, and Gang Hu, "Determinism Is Not Enough: Making Parallel Programs Reliable with Stable Multithreading", Communications of the ACM, Vol. 57 No. 3, Pages 58-69.
We believe what makes multithreading hard is rather quantitative: multithreaded programs have too many schedules. The number of schedules for each input is already enormous because the parallel threads may interleave in many ways, depending on such factors as hardware timing and operating system scheduling. Aggregated over all inputs, the number is even greater. Finding a few schedules that trigger concurrency errors out of all enormously many schedules (so developers can prevent them) is like finding needles in a haystack. Although Deterministic Multi-Threading reduces schedules for each input, it may map each input to a different schedule, so the total set of schedules for all inputs remains enormous.
We attacked this root cause by asking: are all the enormously many schedules necessary? Our study reveals that many real-world programs can use a small set of schedules to efficiently process a wide range of inputs. Leveraging this insight, we envision a new approach we call stable multithreading (StableMT) that reuses each schedule on a wide range of inputs, mapping all inputs to a dramatically reduced set of schedules. By vastly shrinking the haystack, it makes the needles much easier to find. By mapping many inputs to the same schedule, it stabilizes program behaviors against small input perturbations.
The link above is to a publicly available pre-print of the article that appeared in the most recent CACM. The CACM article is a summary of work by Junfeng Yang's research group. Additional papers related to this research can be found at http://www.cs.columbia.edu/~junfeng/
The wiki page on monad performance claims that mtl monad transformers are quite inefficient. Unfortunately, when I'm writing state-heavy Haskell code, I typically have a stack of Writer, State, Reader, ErrorT, etc. I'm trying to benchmark some relatively idiomatic Haskell vs a C++ implementation, and I want to write Haskell code which is as competitive as possible. However, the alternatives shown on that wiki page look awful to me. I'd rather take a performance hit than write code I won't understand in a few months, or that I have to overhaul every time I change my monad stack.
Are these claims about monad transformer performance still valid? If so, are there any alternatives which still allow for relatively readable code? I'm not trying to match the performance of my C++ implementation, but I want it to at least be competitive. I'm already reading up on unboxing, strictness annotations, etc, it's just this higher-level stuff which has so far eluded me.
Thanks!submitted by trolox
[link] [21 comments]
For some upcoming improvements to FP Haskell Center, I recently added a new feature to Stackage: the ability to detect module name conflicts. This is where two different packages both export a module of the same name.
You can see the full module name conflict list for my most recent build. The file format is fairly dumb: one line lists all of the packages using a common module name, and the following line contains all of the module names shared. (JSON, YAML, or CSV would have been better file formats for this, but one of the goals of the Stackage codebase is to avoid extra package dependencies wherever possible.)
Most of these conflicts don't seem problematic at all. The fact that base, haskell98, haskell2010, and base-compat share a lot of the same module names, for example, should be expected, and users really do need to choose just one of those packages to depend on.
Some other cases, on the other hand, might cause issues. For example, both hashmap and unordered-containers export the Data.HashSet module. This can negatively affect users of GHCi who have both packages installed and try to import Data.HashSet. Also, if for some reason a cabal package depended on both, you'd need to use package imports to disambiguate. There can also be an issue of confusion: if I see Data.HashSet at the top of a module, it would be nice to know which package it comes from without having to check a cabal file or running ghc-pkg.
I'm mostly writing this blog post as I think it's the first time we've had any kind of collection of this information, and I don't think we've had a community discussion about conflicting module names. I don't know if the problem is significant enough to even warrant further analysis, or how have thoughts on how to proceed if we do want to try and disambiguate module names.
Here are some of the conflicting module names, and the packages using them:
I was able to release a pretty nice piece of software today, courtesy of my employer, ZipRecruiter. If you have a family of web pages, and whenever you are looking at one you want to know when someone else is looking at the same page, you can use my package. The package is called 2banner, because it pops up a banner on a page whenever two people are looking at it. With permission from ZipRecruiter, I have put it on github, and you can download and use it for free.
A typical use case would be a customer service organization. Say your users create requests for help, and the the customer service reps have to answer the requests. There is a web page with a list of all the unserviced requests, and each one has a link to a page with details about what is requested and how to contact the person who made the request. But it might sometimes happes that Fred and Mary independently decide to service the same request, which is at best a waste of effort, and at worst confusing for the customer who gets email from both Fred and Mary and doesn't know how to respond. With 2banner, when Mary arrives at the customer's page, she sees a banner in the corner that says Fred is already looking at this page!, and at the same time a banner pops up in Fred's browser that says Mary has started looking at this page! Then Mary knows that Fred is already on the case, and she can take over a different one, or Fred and Mary can communicate and decide which of them will deal with this particular request.
You can similarly trick out the menu page itself, to hide the menu items that someone is already looking out.
I wanted to use someone else's package for this, but I was not able to find one, so I wrote one myself. It was not as hard as I had expected. The system comprises three components:
The back-end database for recording who started looking at which pages and when. I assumed a SQL database and wrote a component that uses Perl's DBIx::Class module to access it, but it would be pretty easy throw this away and use something else instead.
An API server that can propagate notifications like “user X is now looking at page Y” and “user X is no longer looking at page Y” into the database, and which can answer the question “who else is looking at page Y right now?”. I used Perl's Catalyst framework for this, because our web app already runs under it. It would not be too hard to throw this away and use something else instead. You could even write a standalone server using HTTP::Server, and borrow most of the existing code, and probably have it working in under an hour.
Often a project seems easy but the more I think about it the harder it seems. This project was the opposite. I thought it sounded hard, and I didn't want to do it. It had been an outstanding request of the CS people for some time, but I guess everyone else thought it sounded hard too, because nobody did anything about it. But the longer I let it percolate around in my head, the simpler it seemed. As I considered it, one difficulty after another evaporated.
Other than the jQuery stuff, which I had never touched before, the hardest part was deciding what to do about the API server. I could easily have written a standalone, special-purpose API server, but I felt like it was the wrong thing, and anyway, I didn't want to administer one. But eventually I remembered that our main web site is driven by Catalyst, which is itself a system for replying to HTTP requests, which already has access to our database, and which already knows which user is making each request.
So it was natural to say that the API was to send HTTP requests to certain URLs on our web site, and easy-peasy to add a couple of handlers to the existing Catalyst application to handle the API requests, query the database, and send the right replies.
I don't know why it took me so long to think of doing the API server with Catalyst. If it had been someone else's suggestion I would probably feel dumb fror not having thought of it myself, because in hindsight it is so obvious. Happily, I did think of it, because it is clearly the right solution for us.
It was not too hard to debug. The three components are largely independent of one another. The draft version of the API server responded to GET requests, which are easier to generate from the browser than the POST requests that it should be getting. Since the responses are in JSON, it was easy to see if the API server was replying correctly.
The package is in ZipRecruiter's GitHub repository, and is available under a very lenient free license. Check it out.
(By the way, I really like working for ZipRecruiter, and we are hiring Perl and Python developers. Please send me email if you want to ask what it is like to work for them.)
It turns out that there was a better way to do this, please see this new post.Rationale
I am currently experimenting with the operational package. This post provides a rough outline on how I moved from an IO based monad stack to an isomorphic pure representation of the computation. I am unfortunately not well endowed on the theoretical side, so this will be a very practical post. It might contain some glaring mistakes, as I just spend a few hours acquainting myself with the concepts and migrating everything, and didn’t test it extensively. I marked the places where I am unsure on how to do something with a number, such as (0).
Here is the type of the main monad, before and after the change :<figure class="code"> 1 2 3 4 -- Before type InterpreterMonad = ErrorT Doc (RSST InterpreterReader InterpreterWriter InterpreterState IO) -- After type InterpreterMonad = ProgramT InterpreterInstr (State InterpreterState) </figure>
I first tried a simple Program InterpreterInstr for the main monad, but I could not write the MonadState instance, as there was a conflicting instance (1). This is the reason why the State monad is there, at the base of the transformer stack.
The goal is to build a representation of the catalog compilation process, in a pure monad, and then transform it into another representation that will actually be executed. In order to do so, all the “effects” need to be encoded as a single type (designated as instr in the operational haddocks). In this case, this is the InterpreterInstr type, detailed here.
You might observe that commands of type m a -> m b become constructors of type m a -> instr b, and not instr a -> instr b (which makes sense if you think about what you are doing, but was not immediately obvious to me when I started writing the types).Implementing the Program monad
First of all, all the effects given by the original transformer stacks have their own instructions : ErrorT has the ErrorThrow and ErrorCatch instructions, and a similar treatment has been realized on the MonadWriter part of the original RSST transformer (it’s like RWST, except faster and not leaky). The MonadState doesn’t need special instructions, as InterpreterMonad is already an instance of MonadState.
The MonadWriter has been dropped, in favor of more specific instructions (the original reader structure can be directly observed in the new instruction set, along with the exposed PuppetDBAPI). Finally, some additional utility functions have been thrown in, as they rely on IO.
With all of this in place, it becomes trivial to write the following instances :<figure class="code"> 1 2 3 4 5 6 7 8 instance MonadError Doc InterpreterMonad where throwError = singleton . ErrorThrow catchError a c = singleton (ErrorCatch a c) instance MonadWriter InterpreterWriter InterpreterMonad where tell = singleton . WriterTell pass = singleton . WriterPass listen = singleton . WriterListen </figure>
Now the refactoring becomes mechanical, and surprisingly non invasive. As can be seen in the patch, it’s mostly about replacing every use of the view and liftIO with the corresponding “singleton” command. I have seen that people write short functions for commonly used instructions, such as :<figure class="code"> 1 pdbGetResources = singleton . PDBGetResources </figure>
I didn’t go for this, as most functions are used at most a couple times.Running the computation
The interpreter is right here, and is pretty painful to read. Its type is however pretty straightforward : given the “Reader” data and a ProgramT, it will create an equivalent (or not) computation represented by another monad. It is exactly(2) as writing in a DSL, and running it through an interpreter.
I was surprised that I had to write the explicit type signatures for the functions that are in the where section of the interpretIO function, but other than that this was a straightforward exercise. As a reaction to a recent popular reddit thread, the “Overview” given in operational’s documentation was invaluable to get started quickly.Conclusion
I have seen this kind of design mentioned at several places, as a common way to keep things pure and easy to reason about. I however thought it was better to think about it earlier in the design process, as changing the base monad of all computations would require a significant rewrite.
The first pleasant surprise was that it only took me a few hours to go from “reading the haddocks” to “refactoring done”.
The second, in some sense, even more pleasant surprise was that there doesn’t seem to be any performance penalty whatsoever.