Summary: Shake can now be configured to check file hashes/digests instead of modification times, which is great if you frequently switch git branches.
Build systems run actions on files, skipping the actions if the files have not changed. An important part of that process involves determining if a file has changed. The Make build system uses modification times to impose an ordering on files, but more modern build systems tend to use the modification time as a proxy for the file contents, where any change indicates the contents have changed (e.g. Shake, Ninja). The alternative approach is to compute a hash/digest of the file contents (e.g. SCons, Redo). As of version 0.13, Shake supports both methods, along with three combinations of them - in this post I'll go through the alternatives, and their advantages/disadvantages.
Modification times rely on the file-system updating a timestamp whenever the file contents are written. Modification time is cheap to query. Saving a file afresh will cause the modification time to change, even if the contents do not - as a result touch causes rebuilds. Unfortunately, working with git branches sometimes modifies a file but leaves it with the same contents, which can result in unnecessary rebuilds (see the bottom of this post for one problematic git workflow).
File digests are computed from the file contents, and accurately reflect if the file contents have changed. There is a remote risk that the file will change without its digest changing, but unless your build system users are actively hostile attackers, that is unlikely. The disadvantage of digests is that they are expensive to compute, requiring a full scan of the file. In particular, after every rule finishes it must scan the file it just built, and on startup the build system must scan all the files. Scanning all the files can cause empty rebuilds to take minutes. When using digests, Shake also records file sizes, since if a file size changes, we know the digest will not match - making most changed digests cheap to detect.
Modification time and file digests combines the two methods so that a file only rebuilds if both the modification time and digest have changed. The advantage is that for files that have not changed the modification time will cheaply detect that, without ever computing the file hash. If the file has changed modification time, then a digest check may save an expensive rebuild, but even if it doesn't, the cost is likely to be small compared to rerunning the rule.
Modification time and file digests on inputs takes the previous method, but only computes digests for input files. Generated files (e.g. compiled binaries) tend to be large (expensive to compute digests) and not edited (rarely end up the same), so a poor candidate for digests. The file size check means this restriction is unlikely to make a difference when checking all files, but may have some limited impact when building.
Modification time or file digests combines the two methods so that a file rebuilds if either modification time or file digest have changed. I can't think of a sensible reason for using this setting, but maybe someone else can?
Suggestions for Shake users
All these options can be set with the shakeChange field of shakeOptions, or using command line flags such as --digest or --digest-and-input. Switching between some change modes will cause all files to rebuild, so I recommend finding a suitable mode and sticking to it.
- If you can't trust the modification times to change, use ChangeDigest.
- If you are using git and multiple branches, use ChangeModtimeAndDigestInput.
- If you have generated files that rewrite themselves but do not change, I recommend using writeFileChanged when generating the file, but otherwise use ChangeModtimeAndDigest.
- Otherwise, I currently recommend using ChangeModtime, but some users may prefer ChangeModtimeAndDigest.
Appendix: The git anti-build-system workflow
Certain common git workflows change files from the current version, to an old version, then back again - causing modification-time checks to run redundant rebuilds. As an example, imagine we have two branches foo and bar, based on remote branches origin/foo and origin/bar, both of which themselves are regularly synced to a common origin/master branch. The difference between origin/foo and origin/bar is likely to be small. To switch from an up-to-date bar to an up-to-date foo we can run git checkout foo && git pull. These commands switch to an out-of-date foo, then update it. As a result, any file that has changed since we last updated foo will change to an old version, then change to a new version, likely the same as it was before we started. This workflow requires build systems to support file digests.
I am new to Haskell, and I am trying to solve a question about fibonacci numbers. Here is the question:
Fibonacci numbers are composed of other repeated fibonacci numbers. For example, to find the 10th Fibonacci number 8th number is calculated 2 times, 6th number is calculated 5 times and 5th number is calculated 8 times.
fib(10) = fib(9) + fib(8) = 2fib(8) + fib(7) = 3fib(7) + 2fib(6) = 5fib(6) + 3fib(5) = 8fib(5) + 5*fib(4)
How many times will the n.th number be calculated to find the m.th number? For example;
Main> fibN 10 5 = 8 , fibN 15 6 = 55Type of the function: fibN :: Int -> Int -> Int
And this is my code. What's wrong with it?
fibN :: Int -> Int -> Int
fibN m n | n==m = 1 | otherwise = fibN (m-1) n + fibN (m-2) nsubmitted by shevchenko7
[link] [1 comment]
re2 is a really interesting regex library. It's based on finite-state automata instead of a backtracking parser, and guarantees linear-time execution with bounded memory consumption. The author, Russ Cox, wrote an article about it at http://swtch.com/~rsc/regexp/regexp3.html
My binding is a departure from the standard Haskell regex API in that there's no giant typeclasses, no fundeps, and no operators. It's still pretty basic and there's a few features missing, but it should be usable for most common regex use cases.
The re2 syntax is documented at re2/wiki/Syntax. Here's a few examples:$ ghci -XOverloadedStrings ghci> import Regex.RE2 ghci> find "\\w+$" "hello world" Just (Match [Just "world"]) ghci> replaceAll "([a-z]+)" "123 456 abc def" "[\\1]" ("123 456 [abc] [def]",2) submitted by jmillikin
[link] [3 comments]
I was slightly frustrated and irritated with a situation at work today, which caused me to think about the word “gumption” as it’s used in Pirsig’s Zen and the Art of Motorcycle Maintenance. That led me to Wikipedia’s article on gumption trap which in turn led me to learn about the concept of learned helplessness.
So, what was the situation and how is it connected to learned helplessness?
The rest is just slightly tongue-in-cheekWhat to standardise on
I’m in situation where the powers-that-be have standardised on applications. Not on open formats or open protocols, but on specific applications that use proprietary formats and proprietary protocols. Of course these applications suck. That’s what a lack of competition does, it removes any will for a company to actually make improvements to their applications! Some of these applications have captured such a large market share that reverse engineering of the formats was inevitable. Yay! That means I can use a sane OS and vastly better applications. However, one protocol is not reverse engineered yet and I’m forced to use the standard application. This application is painful to use and only runs on a crap OS.
How bad can it be? you ask. The application is Outlook, the OS is Windows! Yes! It’s that bad. Hence the thoughts of gumption, or rather the loss of it. Which is exactly what starting Outlook causes. Every time!Connection to learned helplessness
It continues to amaze me that companies standardise on Windows and applications that only run on Windows. There are better alternatives, especially in this day and age with fast networks and powerful and fast execution environments that completely sidestep the whole question of which OS to run. Still there seems to be very little will to upgrade to Linux, or to standardise on web-based applications. Why is that? In the past I’ve thought it might be the network effect. Most often I’ve come to the conclusion that it most likely is simple inertia. What’s the explanation for the inertia though?
This is where learned helplessness can offer an explanation. People have been conditioned and have grown so used to Windows and other Microsoft products that they simply don’t recognise that there now is a way out. No matter how many escape routes that become avilable people simply won’t see them.What to do about it
As the experiments on dogs showed there is hope (from the wikipedia page):
To change their expectation and to recover the dogs from helplessness, experimenters had to physically pick up the dogs and move the legs in a close replication of the physical actions the dogs needed to take to remove themselves from the electrified grid. This had to be replicated at least 2 times before the dogs would exhibit the functional response of jumping over the barrier to get away from the electrified grid. Threats, rewards, and observed demonstrations had no observed effect in helping the dogs to independently move away from the shocks.
Oh how I whish I could pull off the direct translation to my work place: re-install my co-workers computers and replace servers and services. Too bad that’s not a realistic plan. What I can do though is civil disobedience (or maybe it should be called something like civil disobedience in the workplace instead). By simply not conforming and at the same time showing that there are better ways of getting the job done others will hopefully notice and either adopt my way, or come up with something that suits them better (which I then can learn from). Even if that doesn’t happen at least I’ll keep my gumption at healthy levelsWhat I’m doing at the moment
This is what I’m doing at work right now to avoid loss of gumption:
Finally, for Outlook. The decision of the powers-that-be to disable IMAP forces me to:
- Limit my mail reading to twice per day.
- Be logged into Skype to make up for not reading mail more often.
I've noticed over the years that perhaps the biggest factor in my coding performance tends to be build and test times (even though I do spend most of my time in the REPL). Is there a guide out there for keeping your build times down with ghc? I'm trying to build an intuition for how to structure my code and how to pick language extensions in order to keep my process fast.
(P.S. One example I've noticed is files with a lot of records (schema definitions). These seem particularly slow, I suspect that -XGenerics could be one possible culprit, although that particular extension is hard to give up!)submitted by rehno-lindeque
[link] [19 comments]
Each month, my company has a hackday where everyone works on a project and at the end, we vote on the winner. For this month's hackday, I made the voting app we use with Yesod. Broad overview of the functionality:
- You can create a new hackday, naming it something like "May 30th Hackday"
- You can add people's projects to it
- Each person has 3 votes. Your remaining votes are just stored in a session—a user login model is too cumbersome for a room full of people on their phones.
- There's no security model right now, which is nice for showing people the site. Eventually it'd be nice to have HTTP basic auth, which we already use for things like company dashboards
I'm pretty new to both Yesod and Haskell, so any criticism is welcome. Some things I wasn't sure of:
- Where do people put files that aren't handlers, i.e. app logic type files? In /app?
- Where do people put model related code? I just added things to Model.hs; I guess for a bigger app I would break things out into other files and have Model.hs import them.
- Is there a way to use the database's default values when inserting rows (for things like current time) rather than specifying them?
- (I have alot more questions, but I'll probably move those into separate StackOverflow posts)
[link] [4 comments]