Apparently, there’s a small civil war going on between frequentists and Bayesians. Nate Silver’s otherwise enjoyable book, The Signal and the Noise contains a whole chapter devoted to back-talking Fisher and the frequentist viewpoint. I have generally failed to quite appreciate this schism, to my (admittedly limited) understanding, these are just mathematical methods, applicable or not depending on the circumstances. There is one thing that bothers me, though.Frequentists and science
The Scientific Method, in the positivist voice of Karl Popper, starts out with the formulation of a hypothesis - a model of how some aspect of the world works. This model is then used to make predictions, and the predictions are compared to observations. The important thing for Popper is that observations that confirm a theory aren’t particularly valuable, one should always try to disprove hypotheses, and after one fails to do that in all ways imaginable, the model is accepted.
This is derived from logic, where this takes the form of an implication: H ⇒ D. Now observing that D (our expected observation) is true doesn’t really tell us anything about the value of H (our hypothesis). But on the other hand, if we observe that D is false, we know that H must be false as well.
This is closely tied to the frequentist viewpoint, which is kind of a probabilstic phrasing of the above. You formulate the conventional view, the null hypothesis (H0), and a P-value, the probability of making this observation given the null hypothesis is true. Generally, you have alternative hypothesis (H) in mind, and ideally, your observations would be much more likely under H - but it isn’t really necessary, and you can’t strictly claim that H is likely to be right, just because H0 is likely wrong.Bayes formula
Bayes formula, on the other hand, says something about the probability of a hypothesis being true, which is usually what you are interested in. It looks like this:
P(H∣D) = P(D∣H)P(H) / P(D)
It reads as: the probability of a hypothesis, H, given some observation D, is equal to the probablity of observing the data if the hypothesis is true, multiplied by the prior probability of the hypothesis (how likely we thought it was to be true before making our observation), and divided by the (also prior) probability of making our observation regardless of whether H is true or not.
One important difference is that unlike in Popper’s case, observations confirming the theory add value, an observation D modifies the probability that H is true by the ratio of P(D∣H) / P(D).The brain
What has puzzled me, is that the scientific method is allegedly an effective and efficient way of gaining and improving knowledge. After all, that is the point of science, isn’t it? Yet, our minds clearly don’t work that way at all.
Even putting aside well-known logical fallacies, like confirmation bias, how we are terrible at estimating risk and probabilities, and how we form opinions from feelings, not rational analysis - it is clear that we are (or at least I am) not using anything even remotely resembling Poppers scientific method in day to day life.
For instance, I do occasional orienteering, running around in the wilderness equipped with a map and a compass, and trying to work my way to a sequence of control points as fast as possible. As it happens, I’m not terribly good at it, and consequently, I get lost. How do I work out where I am on the map, and get back on track? I formulate a hypothesis: perhaps I am here. Then, I try to confirm this hypothesis: if I am here, there should be a boulder there. Check if there is a boulder. If yes, I’m usually satisfied, and chart my way using this hypothesis. If not, Popper would say that I should discard my hypothesis, but instead, I look for a different confirmation: there should also be a gully there. I’m just not letting go that easily.
Another example: I like to solve crosswords. Often, I have a candidate word that fits (i.e. a hypothetical answer), but too few letters confirm it. If I can find a perpendicular word that support my hypothesis by confiriming an agreeing letter, I accept it - almost immediately. I never bother to try to find words that would falsify my hyptothetical answer.
I think it is clear that these are examples of Bayesian reasoning. In the first example, I have a hypothesis with some - well, let’s be frank: a lot of uncertainity. The prior P(H) is rather low. Now, I observe a boulder where I expected it from the hypothesis. I can then strengthen my hypothesis by P(D∣H) / P(D), or the probability of observing the boulder given that I am where I thought, divided by the probability of observing a boulder in some random place. Unless the terrain is generally very rocky, this improves my confidence a lot.
It is important that P(D∣H) isn’t one - even if I am right about my whereabouts, it is still possible that I don’t observe the boulder. The boulder might be hidden by bushes, or the map might be wrong, for instance.
The argument is similar for crosswords, I leave that as an excercise for the reader.The effectiveness and efficiency of science
What bothers me about this, is that it looks to me as if we are wired to do our decision making using methods that resemble Bayes. And we are clearly not using falsification as a means of deriving knowledge. Now, nature isn’t perfect, but it is usually pretty good at optimizing things. If the scientific method of falsification really is an effective and efficient way of extracting knowledge from observations, why don’t we use it?
One possible reason for our innate Bayesian preference is that in the real world, we rarely deal in absolutes. As in the case with the boulder, our information is quite noisy, and it can be dangerous to change our opinion haphazardly, based on potentially unreliable information.
But even if our brains may be more Bayesian than Popperian, we’re not very good at it. We routinely make all kinds of crazy illogical errors, and it’s hard to imagine that this is somehow advantageous in evolution.
And perhaps that is the reason why Bayes works for us, we are intuitively sloppy, and in reality just have to make do with rough approximations. And the reason science sticks to positivist-frequentist methods, is that science really wants to deal with absolutes.Afterword
The astute reader may have noticed that my argumentation in this text is bayesian. What can I say? I can’t help it, it’s in my nature. And if you found it convincing, chances are it’s in yours, too.References
While writing this, I ran across this link. Although I can’t claim to have reasoned anywhere near as clearly or eloquently about it, it sums up my sentiment pretty well.
Tom Moertel posted an interesting excercise which can be solved using Bayes’ theorem.
Probability Theory: the Logic of Science is an interesting book that talks about probabilistic logic.
I couldn’t find any links off-hand, but I am aware that some people appear to think that intelligence evolved, not to improve our understanding of our environment, but as a social tool. Or: evolution made us dumb enough to get into all kinds of trouble, but compensated by making us smart enought that we could talk us out of it. Sometimes. I don’t buy it.
Many environmentalists seem happy to fight climate change by switching off their lights for one hour per year, or unplugging TVs, or using non-disposable shopping bags. Highly visible as these actions are, they also share the interesting property of not having any noticable effect on the environment.
But one thing you can’t avoid to notice if you look into the numbers, is that beef (and lamb) has a dramatically higher impact on the environment than almost any other kind of food. Although there is some CO2 emissions, the real impact comes from emissions of methane (from the digestive system of ruminants) and nitrous oxide (from fertilizer needed to grow feed). The emissions amount to the equivalent of 16-18 kg CO2 per kg meat. Also, we eat a large amount of meat, in Norway the consumption is on average 71kg meat per person, split about 40:60 on red (beef, lamb) and white (pork, chicken). Taken together, this means that beef alone is responsible for half a ton of CO2 per person. If everybody replaced beef with the next worst alternative, pork, we’d save about 340kg per person, or over 3% or the national emissions. If we replace beef with fish or vegetables, we save even more.
Of course, cattle also is used to produce dairy products, and these have their own acconts in the CO2 budgets. Milk clocks in at 1.1kg CO2 per kg, cheese at about 10 times that. Multiplied with Norwegian consumption, I find that dairy products are responsible for about 300kg CO2 per capita.Other reasons
Even if you don’t care about climate change (or believe in it), there are good reasons to pass the beef by. Maybe you want to preserve the rainforests? Soy and grain production for cattle fodder is a major reason for deforestation.
Or perhaps you care about your health? Saturated animal fat is considered a major risk factor for colon cancer, one of the most common and deadly cancers. The health recommendation is for Norwegians to reduce their consumption of red meats. And although the use is limited here, the use of antibiotics in agriculture breeds resistance genes which, thanks to lateral gene transfer, make their way into human pathogenes.
And if you don’t mind dying young, perhaps you care about economy? Western agriculture is often hopelessly subsidized, and e.g. Norway subsidizes overproduction of milk, and then adds export subsidies to dairy products - in other words, taxpayers pay the bill twice. Total subsidies and import protections amount to more than the farmers have in income.
There’s also eutrophication of lakes and rivers from fertilizer, and pesticides that makes its way into the ecosystems. And, of course, animal welfare issues. Or the questionable morality of using arable land for feeding animals - growing food instead of feed would be able to sustain at least five times the population using the same land.
I guess it is fair to remark that not all alternatives are better than beef here, for instance pork and chicken also depend a lot on grain and soy feeds, as well as antibiotics. Fish is generally better on all counts (with wild fish caught with nets or lines having negligible impact, and farmed or trawled being only about twice as climate-friendly as chicken), and vegetables usually best.Does it matter?
Yes, it does! If we all cut out beef and lamb entirely, we could drop total greenhouse emissions by 5%, and if we include dairy products, close to 7-8%. This is about the same as our total CO2 emissions from cars. At least to me, cutting out beef is no big deal, there are plenty of good alternatives. So from now on, no beef on my table. Consider doing the same.
In the previous two posts, we've built up a whole range of applicatives, out of Const, Identity, Reader, Compose, Product, Sum, and Fix (and some higher-order analogues). Sum has given us the most trouble, but in some sense has been the most powerful, letting us write things like possibly eventually terminating lists, or trees, or in fact any sort of structure with branching alternatives. In this post, I want to think a bit more about why it is that Sum is the trickiest of the bunch, and more generally, what we can say about when two applicative structures are the "same". In the process of doing so, we'll invent something a lot like Traversable en passant.
Let's do some counting exercises. Product Identity Identity holds exactly two things. It is therefore isomorphic to ((->) Bool), or if we prefer, ((->) Either () ()). That is to say that a pair that holds two values of type a is the same as a function that takes a two-valued type and yields a value of type a. A product of more functors in turn is isomorphic to the reader of the sum of each of the datatypes that "represent" them. E.g. Product (Product Identity Identity) (Product (Const ()) Identity) is iso to ((->) (Either (Either () ()) ()), i.e. a data type with three possible inhabitants. In making this move we took Product to Either -- multiplication to sum. We can pull a similar trick with Compose. Compose (Product Identity Identity) (Product Identity Identity) goes to ((->) (Either () (),Either () ())). So again we took Product to a sum type, but now we took Compose to a pair -- a product type! The intuition is that composition multiplies the possibilities of spaces in each nested functor.
Hmm.. products go to sums, composition goes to multiplication, etc. This should remind us of something -- these rules are exactly the rules for working with exponentials. x^n * x^m = x^(n + m). (x^n)^m = x^(n*m). x^0 = 1, x^1 = x.
Seen from the right standpoint, this isn't surprising at all, but almost inevitable. The functors we're describing are known as "representable," a term which derives from category theory. (See appendix on representable functors below).
In Haskell-land, a "representable functor" is just any functor isomorphic to the reader functor ((->) a) for some appropriate a. Now if we think back to our algebraic representations of data types, we call the arrow type constructor an exponential. We can "count" a -> x as x^a, since e.g. there are 3^2 distinct functions that inhabit the type 2 -> 3. The intuition for this is that for each input we pick one of the possible results, so as the number of inputs goes up by one, the number of functions goes up by multiplying through by the set of possible results. 1 -> 3 = 3, 2 -> 3 = 3 * 3, (n + 1) -> 3 = 3 * (n -> 3).
Hence, if we "represent" our functors by exponentials, then we can work with them directly as exponentials as well, with all the usual rules. Edward Kmett has a library encoding representable functors in Haskell.
Meanwhile, Peter Hancock prefers to call such functors "Naperian" after John Napier, inventor of the logarithm (See also here). Why Naperian? Because if our functors are isomorphic to exponentials, then we can take their logs! And that brings us back to the initial discussion of type mathematics. We have some functor F, and claim that it is isomorphic to -^R for some concrete data type R. Well, this means that R is the logarithm of F. E.g. (R -> a, S -> a) =~ Either R S -> a, which is to say that if log F = R and log G =~ S, then log (F * G) = log F + log G. Similarly, for any other data type n, again with log F = R, we have n -> F a =~ n -> R -> a =~ (n * R) -> a, which is to say that log (F^n) =~ n * log F.
This gives us one intuition for why the sum functor is not generally representable -- it is very difficult to decompose log (F + G) into some simpler compound expression of logs.
So what functors are Representable? Anything that can be seen as a fixed shape with some index. Pairs, fixed-size vectors, fixed-size matrices, any nesting of fixed vectors and matricies. But also infinite structures of regular shape! However, not things whose shape can vary -- not lists, not sums. Trees of fixed depth or infinite binary trees therefore, but not trees of arbitrary depth or with ragged structure, etc.
Representable functors turn out to be extremely powerful tools. Once we know a functor is representable, we know exactly what its applicative instance must be, and that its applicative instance will be "zippy" -- i.e. acting pointwise across the structure. We also know that it has a monad instance! And, unfortunately, that this monad instance is typically fairly useless (in that it is also "zippy" -- i.e. the monad instance on a pair just acts on the two elements pointwise, without ever allowing anything in the first slot to affect anything in the second slot, etc.). But we know more than that. We know that a representable functor, by virtue of being a reader in disguise, cannot have effects that migrate outwards. So any two actions in a representable functor are commutative. And more than that, they are entirely independent.
This means that all representable functors are "distributive"! Given any functor f, and any data type r, then we havedistributeReader :: Functor f => f (r -> a) -> (r -> f a) distributeReader fra = \r -> fmap ($r) fra
That is to say, given an arrow "inside" a functor, we can always pull the arrow out, and "distribute" application across the contents of the functor. A list of functions from Int -> Int becomes a single function from Int to a list of Int, etc. More generally, since all representable functors are isomorphic to reader, given g representable, and f any functor, then we have: distribute :: (Functor f, Representable g) => f (g a) -> g (f a).
This is pretty powerful sauce! And if f and g are both representable, then we get the transposition isomorphism, witnessed by flip! That's just the beginning of the good stuff. If we take functions and "unrepresent" them back to functors (i.e. take their logs), then we can do things like move from ((->) Bool) to pairs, etc. Since we're in a pervasively lazy language, we've just created a library for memoization! This is because we've gone from a function to a data structure we can index into, representing each possible argument to this function as a "slot" in the structure. And the laziness pays off because we only need to evaluate the contents of each slot on demand (otherwise we'd have a precomputed lookup table rather than a dynamically-evaluated memo table).
And now suppose we take our representable functor in the form s -> a and paired it with an "index" into that function, in the form of a concrete s. Then we'd be able to step that s forward or backwards and navigate around our structure of as. And this is precisely the Store Comonad! And this in turn gives a characterization of the lens laws.
What this all gives us a tiny taste of, in fact, is the tremendous power of the Yoneda lemma, which, in Haskell, is all about going between values and functions, and in fact captures the important universality and uniqueness properties that make working with representable functors tractable. A further tiny taste of Yoneda comes from a nice blog post by Conal Elliott on memoization.
Extra Credit on Sum Functors
There in fact is a log identity on sums. It goes like this:log(a + c) = log a + log (1 + c/a)
Do you have a useful computational interpretation of this? I've got the inklings of one, but not much else.
Appendix: Notes on Representable Functors in Hask.
The way to think about this is to take some arbitrary category C, and some category that's basically Set (in our case, Hask. In fact, in our case, C is Hask too, and we're just talking about endofunctors on Hask). Now, we take some functor F : C -> Set, and some A which is an element of C. The set of morphisms originating at A (denoted by Hom(A,-)) constitutes a functor called the "hom functor." For any object X in C, we can "plug it in" to Hom(A,-), to then get the set of all arrows from A to X. And for any morphism X -> Y in C, we can derive a morphism from Hom(A,X) to Hom(A,Y), by composition. This is equivalent to, in Haskell-land, using a function f :: x -> y to send g :: a -> x to a -> y by writing "functionAToY = f . g".
So, for any A in C, we have a hom functor on C, which is C -> Set, where the elements of the resultant Set are homomorphisms in C. Now, we have this other arbitrary functor F, which is also C -> Set. Now, if there is an isomorphism of functors between F, and Hom(A,_), then we say F is "representable". A representable functor is thus one that can be worked with entirely as an appropriate hom-functor.
The Yesod team is pleased to announce the release of Yesod 1.2. You can get it with:cabal install yesod-platform yesod-bin
The yesod binary is now a separate package which helps manage dependencies, but it does mean you need to remember to install 2 separate packages.
- Esqueleto was released shortly after Yesod 1.1. This gives SQL users the full power of raw SQL but fully type-checked queries.
- yesod-pure was released for those who want less Template Haskell and type-safety.
- Conduit 1.0 and Wai 1.4 were released
- TypeScript template support was added
- CSS templates support mixins
- Persistent 1.2 was released
- The Warp web server has been continually sped up. These kinds of efforts make Yesod look pretty good in a recently created benchmark suite. There is now a WAI entry in addition to the Yesod entries.
Previously discussed in the post: a better representation system, cleaner internals, and the request local cache. Providing different representation types (JSON or HTML) used to be cumbersome at times, but now it is simple using selectRep.getResource :: Handler TypedContent getResource = do selectRep $ do provideRep $ [hamlet|<div>|] provideRep $ object ["result" .= "ok"]Request local type-based caching
See the previous mentioned blog post, but you just need to create a newtype wrapper around some data and then you can cache it with the cached function.Subsite overhaul
yesod-test was completely overhauled, making it easier to use and providing cleaner integration with hspec.. It is easy in Haskell to just lean against the type system for most things and skip testing, particularly if it is something that is hard to test with QuickCheck. But yesod-test (and wai-test) are there to prevent bugs that the type system cannot.Even more
- More efficient session handling.
- yesod-auth email plugin now supports logging in via username in addition to email address.
- probably more stuff we forgot to mention
FPComplete's development of the School of Haskell has been great for the Haskell community to keep spreading knowledge. It has also been running with the changes for 1.2 for quite a while which should contribute to making 1.2 a more stable release.
[https://github.com/yesodweb/yesod/wiki/Changelog#yesod-12-not-yet-released](The high-level changelog) has been discussed in high-level here. [https://github.com/yesodweb/yesod/wiki/Detailed-change-list#not-yet-released-yesod-12](Detailed changes are here)
The book documentation for 1.2 has been started, but still needs more work to get fully up to date.
Most of the changes to upgrade your site to 1.2 should be fairly mechanical. I started a wiki page for the upgrade. If you have any issues, please note them there or on the mail list.
We hope you enjoy using Yesod 1.2