Summary: I wrote an extra library, which contains lots of my commonly used functions.
When starting to write Bake, my first step was to copy a lot of utility functions from Shake - things like fst3 (select the first element of a triple), concatMapM (monadic version of concatMap), withCurrentDirectory (setCurrentDirectory under bracket). None of these functions are specific to either Bake or Shake, and indeed, many of the ones in Shake had originally came from HLint. Copy and pasting code is horrible, so I extracted the best functions into a common library which I named extra. Unlike the copy/paste versions in each package, I then wrote plenty of tests, made sure the functions worked in the presence of exceptions, did basic performance optimisation and filled in some obvious gaps in functionality.
I'm now using the extra library in all the packages above, plus things like ghcid and Hoogle. Interestingly, I'm finding my one-off scripts are making particularly heavy use of the extra functions. I wrote this package to reduce my maintenance burden, but welcome other users of extra.
My goal for the extra library is simple additions to the standard Haskell libraries, just filling out missing functionality, not inventing new concepts. In some cases, later versions of the standard libraries provide the functions I want, so there extra makes them available all the way back to GHC 7.2, reducing the amount of CPP in my projects. A few examples:
- Control.Monad.Extra.concatMapM provides a monadic version of concatMap, in the same way that mapM is a monadic version of map.
- Data.Tuple.Extra.fst3 provides a function to get the first element of a triple.
- Control.Exception.Extra.retry provides a function that retries an IO action a number of times.
- System.Environment.Extra.lookupEnv is a function available in GHC 7.6 and above. On GHC 7.6 and above this package reexports the version from System.Environment while on GHC 7.4 and below it defines an equivalent version.
The module Extra documents all functions provided by the library, so is a good place to go to see what is on offer. Modules such as Data.List.Extra provide extra functions over Data.List and also reexport Data.List. Users are recommended to replace Data.List imports with Data.List.Extra if they need the extra functionality.Which functions?
When selecting functions I have been guided by a few principles.
- I have been using most of these functions in my packages - they have proved useful enough to be worth copying/pasting into each project.
- The functions follow the spirit of the original Prelude/base libraries. I am happy to provide partial functions (e.g. fromRight), and functions which are specialisations of more generic functions (whenJust).
- Most of the functions have trivial implementations. If a beginner couldn't write the function, it probably doesn't belong here.
- I have defined only a few new data types or type aliases. It's a package for defining new utilities on existing types, not new types or concepts.
One feature I particularly like about this library is that the documentation comments are tests. A few examples:Just True ||^ undefined == Just True
retry 3 (fail "die") == fail "die"
whenJust (Just 1) print == print 1
\x -> fromRight (Right x) == x
\x -> fromRight (Left x) == undefined
These equalities are more semantic equality than Haskell's value equality. Things with lambda's are run through QuickCheck. Things which print to stdout are captured, so the print 1 test really does a print, which is scraped and compared to the LHS. I run these tests by passing them through a preprocessor, which spits out this code, which I then run with some specialised testing functions.
I am trying to use Servant to create some REST APIs. I have used Yesod before and I liked the fact the routes automatically got converted to Key types in my handler.
Reviewing servant's documentation: http://haskell-servant.github.io/tutorial/server.html, it looks like there is a way to convert compound types so servant can parse the incoming route and convert to the right type. However, I cannot figure out how to make this work with Persistent's Entity Key. I have this line of code:<|> "users" :> Capture "uid" UserId :> Get '[JSON] User
and I get this error:No instance for (FromText (Key User)) arising from a use of ‘serve’ In the expression: serve userAPI (readerServer cfg) In an equation for ‘app’:
I am not sure how to write the instance for FromText
I tried something like but did not really work:instance FromText UserId where fromText = fmap UserId fromText
I am not very proficient in Haskell. Appreciate if someone can help me convert string value in the route to a Persistent Key (such as UserId) so I can use it directly in my handler.
Thanks!submitted by lleksah15
[link] [12 comments]
What are the most important extensions used while writing haskell code? Or in other words, which extensions are sure to become a part of the core ghc compiler in some future version?submitted by desijays
[link] [19 comments]
I wanted to implement a functional heap. The goal was to use it for lazy heap sorting (so that head . sort is fast) so it's important that as much as possible is kept lazy.
I implemented it but it turned out to be really slow. Takes 13 seconds to sort 1 million numbers, and 8 seconds of that are eaten by the GC because it consumes nearly 3gb of RAM during execution. Please help me optimize it.Darwin226
[link] [8 comments]
Summary: Writing a fast continuous integration server is tricky.
When I started writing the continuous integration server Bake, I thought I had a good idea of what would go fast. It turned out I was wrong. The problem Bake is trying to solve is:
- You have a set of tests that must pass, which take several hours to run.
- You have a current state of the world, which passes all the tests.
- You have a stream of hundreds of incoming patches per day that can be applied to the current state.
- You want to advance the state by applying patches, ensuring the current state always passes all tests.
- You want to reject bad patches and apply good patches as fast as possible.
I assume that tests can express dependencies, such as you must compile before running any tests. The test that performs compilation is special because it can go much faster if only a few things have changed, benefiting from incremental compilation.
Both my wrong solution, and my subsequent better solution, are based on the idea of a candidate - a sequence of patches applied to the current state that is the focus of testing. The solutions differ in when patches are added/removed from the candidate and how the candidates are compiled.A Wrong Solution
My initial solution compiled and ran each candidate in a separate directory. When the directory was first created, it copied a nearby candidate to try and benefit from incremental compilation.
Each incoming patch was immediately included in the candidate, compiled, and run on all tests. I would always run the test that had not passed for the longest time, to increase confidence in more patches. Concretely, if I have run test T1 on patch P1, and P2 comes in, I start testing T2 on the combination of P1+P2. After that passes I can be somewhat confident that P1 passes both T1 and T2, despite not having run T2 on just P1.
If a test fails, I bisect to find the patch that broke it, reject the patch, and immediately throw it out of the candidate.The Problems
There are three main problems with this approach:
- Every compilation starts with a copy of a nearby candidate. Copying a directory of lots of small files (the typical output of a compiler) is fantastically expensive on Windows.
- When bisecting, I have to compile at lots of prefixes of the candidate, the cost of which varies significantly based on the directory it starts from.
- I'm regularly throwing patches out of the candidate, which requires a significant amount of compilation, as it has to recompile all patches that were after the rejected patch.
- I'm regularly adding patches to the candidate, each of which requires an incremental compilation, but tends to be dominated by the directory copy.
This solution spent all the time copying and compiling, and relatively little time testing.A Better Solution
To benefit from incremental compilation and avoid copying costs, I always compile in the same directory. Given a candidate, I compile each patch in the series one after another, and after each compilation I zip up the interesting files (the test executables and test data). To bisect, I unzip the relevant files to a different directory. On Windows, unzipping is much more expensive than zipping, but that only needs to be done when bisecting is required. I also only need to zip the stuff required for testing, not for building, which is often much smaller.
When testing a candidate, I run all tests without extending the candidate. If all the tests pass I update the state and create a new candidate containing all the new patches.
If any test fails I bisect to figure out who should be rejected, but don't reject until I've completed all tests. After identifying all failing tests, and the patch that caused each of them to fail, I throw those patches out of the candidate. I then rebuild with the revised candidate and run only those tests that failed last time around, trying to seek out tests where two patches in a candidate both broke them. I keep repeating with only the tests that failed last time, until no tests fail. Once there are no failing tests, I extend the candidate with all new patches, but do not update the state.
As a small tweak, if there are two patches in the queue from the same person, where one is a superset of the other, I ignore the subset. The idea is that if the base commit has an error I don't want to track it down twice, once to the first failing commit and then again to the second one.Using this approach in Bake
First, the standard disclaimer: Bake may not meet your needs - it is a lot less developed than other continuous integration systems. If you do decide to use Bake, you should run from the git repo, as the Hackage release is far behind. That said, Bake is now in a reasonable shape, and might be suitable for early adopters.
In Bake this approach is implemented in the StepGit module, with the ovenStepGit function. Since Bake doesn't have the notion of building patches in series it pretends (to the rest of Bake) that it's building the final result, but secretly caches the intermediate steps. If there is a failure when compiling, it caches that failure, and reports it to each step in the bisection, so Bake tracks down the correct root cause.
I am currently recommending ovenStepGit as the "best" approach for combining git and an incremental build system with Bake. While any incremental build system works, I can't help but plug Shake, because its the best build system I've ever written.