News aggregator

Ketil Malde: Can you trust science?

Planet Haskell - Wed, 03/25/2015 - 2:00pm

Hardly a week goes by without newspaper writing about new and exciting results from science. Perhaps scientists have discovered a new wonderful drug for cancer treatment, or maybe they have found a physiological cause for CFS. Or perhaps this time they finally proved that homeopathy works? And in spite of these bold announcements, we still don't seem to have cured cancer. Science is supposed to be the method which enables us to answer questions about how the world works, but one could be forgiven for wondering whether it, in fact, works at all.

As my latest contribution to my local journal club, I presented a paper by Ioannidis, titled Why most published research findings are false 1. This created something of a stir when it was published in 2005, because it points out some simple mathematical reasons why science isn't as accurate as we would like to believe.

The ubiquitous p-value

Science is about finding out what is true. For instance, is there a relationship between treatment with some drug and the progress of some disease - or is there not? There are several ways to go about finding out, but in essence, it boils down to making some measurements, and doing some statistical calculations. Usually, the result will be reported along with a p-value, which is a by-product of the statistical calculations saying something about how certain we are of the results.

Specifically, if we claim there is a relationship, the associated p-value is the probability we would make such a claim even if there is no relationship in reality.

We would like this probability to be low, of course, and since we usually are free to select the p-value threshold, it is usually chosen to be 0.05 (or 0.01), meaning that if the claim is false, we will only accept it 5% (or 1%) of the times.

The positive predictive value

Now, the p-value is often interpreted as the probability of our (positive) claim being wrong. This is incorrect! There is a subtle difference here, which it is important to be aware of. What you must realize, is that the probability α relies on the assumption that the hypothesis is wrong - which may or may not be true, we don't know (which is precisely why we want to find out).

The probability of a claim being wrong after the fact is called the positive predictive value (PPV). In order to say something about this, we also need to take into account the probability of claiming there exists a relationship when the claim is true. Our methods aren't perfect, and even if a claim is true, we might not have sufficient evidence to say for sure.

So, take one step back and looking at our options. Our hypothesis (e.g., drug X works against disease Y) can be true or false. In either case, our experiment and analysis can lead us to reject or accept it with some probability. This gives us the following 2-by-2 table:

True False Accept 1-β α Reject β 1-α

Here, α is the probability of accepting a false relationship by accident (i.e., the p-value), and β is the probability of missing a true relationship -- we reject a hypothesis, even when it is true.

To see why β matters, consider a hypothetical really really poor method, which has no chance of identifying a true relationship, in other words, $\beta$=1. Then, every accepted hypothesis must come from the False column, as long as α is at all positive. Even if the p-value threshold only accepts 1 in 20 false relationships, that's all you will get, and as such, they constitute 100% of the accepted relationships.

But looking at β is not sufficient either. Let's say a team of researchers test hundreds of hypotheses, which all happen to be false? Then again, some of them will get accepted anyway (sneaking in under the p-value threshold α), and since there are no hypotheses in the True column, again every positive claim is false.

A β of 1 or a field of research with 100% false hypotheses are extreme cases2, and in reality, things are not quite so terrible. The Economist had a good article with a nice illustration showing how this might work in practice with more reasonable numbers. It should still be clear that the ratio of true to false hypotheses being tested, as well as the power of the analysis to identify true hypotheses are important factors. And if these numbers approach their limits, things can get quite bad enough.

More elaborate models

Other factors also influence the PPV. Try as we might to be objective, scientists often try hard to find a relationship -- that's what you can publish, after all3. Perhaps in combination with a less firm grasp of statistics than one could wish for (and scientists who think they know enough statistics are few and far between - I'm certainly no exception there), this introduces bias towards acceptance.

Multiple teams pursuing the same challenges in a hot and rapidly developing field also decrease the chance of results being correct, and there's a whole cottage industry of scientist reporting spectacular and surprising results in high-ranking journals, followed by a trickle of failures to replicate.

Solving this

One option is to be stricter - this is the default when you do multiple hypothesis testing, you require a lower p-value threshold in order to reduce α. The problem with this is that if you are stricter with what you accept as true, you will also reject more actually true hypotheses. In other words, you can reduce α, but only at the cost of increasing β.

On the other hand, you can reduce β by running a larger experiment. One obvious problem with this is cost, for many problems, a cohort of a hundred thousand or more is necessary, and not everybody can afford to run that kind of studies. Perhaps even worse, a large cohort means that almost any systematic difference will be found significant. Biases that normally are negligible will show up as glowing bonfires in your data.

In practice?

Modern biology has changed a lot in recent years, and today we are routinely using high-throughput methods to test the expression of tens of thousands of genes, or the value of hundreds of thousands of genetic markers.

In other words, we simultaneously test an extreme number of hypotheses, where we expect a vast majority of them to be false, and in many cases, the effect size and the cohort are both small. It's often a new and exciting field, and we usually strive to use the latest version of the latest technology, always looking for new and improved analysis tools.

To put it bluntly, it is extremely unlikely that any result from this kind of study will be correct. Some people will claim these methods are still good for "hypothesis generation", but Ioannidis shows a hypothetical example where a positive result increases the likelihood that a hypothesis is correct by 50%. This doesn't sound so bad, perhaps, but in reality, the likelihood is only improved from 1 in 10000 to 1 in 7000 or so. I guess three thousand fewer trials to run in the lab is something, but you're still going to spend the rest of your life running the remaining ones.

You might expect scientists to be on guard for this kind of thing, and I think most scientists will claim they desire to publish correct results. But what counts for your career is publications and citations, and incorrect results are no less publishable than correct ones - and might even get cited more, as people fail to replicate them. And as you climb the academic ladder, publications in high-ranking journals is what counts, an for that you need spectacular results. And it is much easier to get spectacular incorrect results than spectacular correct ones. So the academic system rewards and encourages bad science.

Consequences

The bottom line is to be skeptical of any reported scientific results. The ability of the experiment and analysis to discover true relationships is critical, and one should always ask what the effect size is, and what the statistical power -- the probability of detecting a real effect -- is.

In addition, the prior probability of the hypothesis being true is crucial. Apparently-solid, empirical evidence of people getting cancer from cell phone radiation, or working homeopathic treatment of disease can almost be dismissed out of hand - there simply is no probable explanation for how that would work.

A third thing to look out for, is how well studied a problem is, and how the results add up. For health effects of GMO foods, there is a large body of scientific publications, and an overwhelming majority of them find no ill effects. If this was really dangerous, wouldn't some of these investigations show it conclusively? For other things, like the decline of honey bees, or the cause of CFS, there is a large body of contradictory material. Again - if there was a simple explanation, wouldn't we know it by now?

  1. And since you ask: No, the irony of substantiating this claim with a scientific paper is not lost on me.

  2. Actually, I would suggest that research in paranormal phenomena is such a field. They still manage to publish rigorous scientific works, see this Less Wrong article for a really interesting take.

  3. I think the problem is not so much that you can't publish a result claiming no effect, but that you can rarely claim it with any confidence. Most likely, you just didn't design your study well enough to tell.

Categories: Offsite Blogs

Confusion regarding the differences between ByteString types

Haskell on Reddit - Wed, 03/25/2015 - 12:34pm

Hello /r/haskell. Currently I'm working on a server program that needs to parse incoming data from sockets. I'm a bit confused as to which data types I should be using. I believe I need to use Strict ByteStrings, and some of the incoming data might be unicode.

Basically, I parse the first N bytes following an established protocol, and a payload that might represent valid unicode follows. From what I understand, it makes sense to interpret the socket data as a ByteString, and convert the relevant unicode portion to Text.

When I hGet the data from my socket handle, which ByteString should I be choosing?

Data.ByteString(ByteString) Data.ByteString.Lazy(ByteString) Data.ByteString.Char8 (ByteString) ... etc.

In addition, what difference will there be between strict and lazy bytestrings, and under what situation should I choose one or the other? Some libraries return lazy bytestrings when I am using strict ones, and vice-versa.

Thank you in advance for helping clear my confusion.

submitted by pythonista_barista
[link] [10 comments]
Categories: Incoming News

Where does GHC spend most of it's time during compilation?

Haskell on Reddit - Wed, 03/25/2015 - 8:04am

I'm just wondering what contributing factors result in long compile times for GHC (with the exception of Template Haskell). Is it things like type checking & analysis, is it the codegen process, or whatever.

submitted by gaymenonaboat
[link] [15 comments]
Categories: Incoming News

FP Complete: FP Complete's Hackage mirror

Planet Haskell - Wed, 03/25/2015 - 8:00am

We have been running a mirror of Hackage package repository which we use internally for the FP Complete Haskell Centre's IDE, building Stackage, and other purposes. This has been an open secret, but now we're making it official.

To use it, replace the remote-repo line in your ~/.cabal/config with the following:

remote-repo: hackage.fpcomplete.com:http://hackage.fpcomplete.com/

Then run cabal update, and you're all set.

This mirror is updated every half-hour. It is statically hosted on Amazon S3 so downtime should be very rare (Amazon claims 99.99% availability).

The mirror does not include the HTML documentation. However, Stackage hosts documentation for a large set of packages.

We have also released our hackage-mirror tool. It takes care of efficiently updating a static mirror of Hackage on S3, should anyone wish to host their own. While the official hackage-server has its own support for mirroring, our tool differs in that it does not require running a hackage-server process to host the mirror.

HTTPS for Stackage

On a tangentially related note, we have enabled TLS for www.stackage.org. Since cabal-install does not support TLS at this time, we have not set up an automatic redirect from insecure connections to the https:// URL.

Categories: Offsite Blogs

What are your most persuasive examples of using Quickcheck?

Haskell on Reddit - Wed, 03/25/2015 - 5:32am

I'm writing documentation for my Python Quickcheck-like library, Hypothesis and I'm looking for examples of using Quickcheck that are a little more persuasive than reversing a list or checking commutativity of numbers.

In particular I'm looking for examples that make people go "Oh, I could totally use this in $DAYJOB". I find most quickcheck examples seem to start from the assumption that you're writing a library, which most people aren't.

The examples I have so far are:

But I'd really like more, particularly ones from domains where you wouldn't necessarily think to have used Quickcheck, or ones with a certain "wow!" factor.

submitted by DRMacIver
[link] [45 comments]
Categories: Incoming News

Command Line Args passed to wxHaskell

haskell-cafe - Wed, 03/25/2015 - 4:20am
I am seeing strange behavior with a wxHaskell app compiled on Windows 7. On Linux, all is well. I can call my app like: app +RTS -N4 -RTS myArg And in the app I can process the myArg and start a wxHaskell frame. When I compile the same application on Windows 7, I get an error dialog box that says: “Unexpected parameter `+RTS`. And a second Usage dialog that looks like it comes from wxHaskell. I am not sure why Windows is different, but perhaps it is the fact that on Windows 7 I compiled up wxHaskell 0.92, and on Linux I used 0.91 from a cabal update. I used 0.92 on on Windows because I could not get 0.91 to compile due to some type problems where the wxPack version was incompatible with a header file and the Haskell compiler related to 64 bit types long long. There is some noise about this on the web, but no solutions. Nonetheless, I assume that args are grabbed directly by wxHaskell and Environment.getArgs does not consume them such that they are still available to wxHaskell. Is there some way to
Categories: Offsite Discussion

attoparsec and (binary|cereal)

Haskell on Reddit - Wed, 03/25/2015 - 3:09am

Hi there, so I'm a beginner (don't blame :D) trying to write a parser for apache kafka log files. Those files are written in binary format containing messages with the following structure:

8 byte message offset number 4 byte size containing an integer N 4 byte CRC32 of the message 1 byte "magic" identifier to allow format changes, value is 2 currently 1 byte "attributes" identifier 4 byte key length, containing length K (K byte key) -- doesn't exist in our case for the moment 4 byte payload length, containing length V V byte payload

Since performance matters, I made a first attempt with attoparsec library which seems suitable at this place to me. Now I'm facing 2 problems: laziness and error handling.

For my questions, showing there the first two fields will be enough. Here is my first attempt with the take N function:

entryParser :: Parser LogEntry entryParser = do o <- offsetParser l <- lengthParser lengthParser :: Parser Length lengthParser = do l <- take 4 offsetParser :: Parser Offset offsetParser = do o <- take 8

So first of all I have to decode the taken bytestrings into Int64 for offset field and Int32 for the length field, which relates to my first question. I tried to use cereal for lazy decoding. This didn't work because o <- take won't return a lazy bytestring. Is there any way to take those X bytes lazy? Is it even relevant whether its lazy or not...?

In a second step I tried to decode with binary library which seems to work fine. But now I'm facing the problem with error handling since decode returns Either String a.

lengthParser :: Parser Length lengthParser = do l <- take 4 case (decode l) of Left l -> return $ ? Right r -> return $ r offsetParser :: Parser Offset offsetParser = do o <- take 8 let offset = decode o case offset of Left l -> return ? Right r -> return r

What would be the appropriate way to handle errors in this scenario? Would it be appropriate to create some type, lets say Either String Offset, and pass this one layer up to my entryParser. Then again distinguish Left/Right and so forth? This doesn't seems really intuitive to me...

So any hints are appreciated a lot :)

submitted by mjnet
[link] [6 comments]
Categories: Incoming News

Ken T Takusagawa: [zyxbhqnd] Defining monads with do notation

Planet Haskell - Wed, 03/25/2015 - 2:33am

If one happens to be most comfortable with "do" notation for monads ("What are monads? They are the things for which "do" notation works well."), so monads implicitly being defined in terms of bind (>>=) and return, here are the definitions of map and join, the "other" way of defining monads:

join :: (Monad m) => m (m a) -> m a;
join xs = do { x <- xs ; x }

map :: (Monad m) => (a -> b) -> m a -> m b;
map f xs = do { x <- xs ; return $ f x }

map is identical to liftM and slightly narrower than fmap which requires only the Functor typeclass instead of Monad.  This redundancy is one of the motivations for the AMP proposal.  Incidentally, map (as defined above) would work as well as the famous Prelude map function which operates only on lists, because a list is a Monad (and a Functor).

Just for completeness, here is bind defined in do notation:

(>>=) :: (Monad m) => m a -> (a -> m b) -> m b;
(>>=) xs f = do { x <- xs ; f x }

I sometimes like explicitly using the bind operator instead of do notation because the syntax, specifying xs then f, lines up well with the thought process "first prepare the input xs to a function, then call the function f on it".  It also works well for longer chains.  For example, the expression xs >>= f >>= g >>= h is equivalent to

do {
x <- xs;
y <- f x;
z <- g y;
h z;
}

but not having to name the intermediate results.

Inspired by the tutorial Monads as containers.

Update: added type signature for (>>=).

Categories: Offsite Blogs

cabal test and tests that overlap in covarage

haskell-cafe - Wed, 03/25/2015 - 2:32am
is this a cabal error or hpc error? i see this when running 'cabal test' with tests that overlap in coverage 5 of 5 test suites (5 of 5 test cases) passed. hpc: found 2 instances of RBM.List in ["./.hpc","./dist/hpc/vanilla/mix/proto","./dist/hpc/vanilla/mix/perf-repa-RBM","./dist/hpc/vanilla/mix/test-repa-RBM","./dist/hpc/vanilla/mix/perf-list-RBM","./dist/hpc/vanilla/mix/test-list-RBM","./dist/hpc/vanilla/mix/rbm-0.0"] Thanks, Anatoly
Categories: Offsite Discussion

GHC Weekly News - 2015/03/24

Haskell on Reddit - Wed, 03/25/2015 - 1:23am
Categories: Incoming News

CFP: Erlang Workshop 2015

General haskell list - Tue, 03/24/2015 - 10:26pm
Hello all, Please find below the Call for Papers for the Fourteenth ACM SIGPLAN Erlang Workshop. Apologies for any duplicates you may receive. CALL FOR PAPERS =============== Fourteenth ACM SIGPLAN Erlang Workshop ----------------------------------------------------------- Vancouver, Canada, September 4, 2015 Satellite event of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015) August 30 - September 5, 2015 _http://www.erlang.org/workshop/2015/ErlangWorkshop2015.html_ Erlang is a concurrent, distributed functional programming language aimed at systems with requirements of massive concurrency, soft real time response, fault tolerance, and high availability. It has been available as open source for 16 years, creating a community that actively contributes to its already existing rich set of libraries and applications. Originally created for telecom applications, its usage has spread to other domains including e-commerce, banking, databases, and computer telephony and mes
Categories: Incoming News

The GHC Team: GHC Weekly News - 2015/03/24

Planet Haskell - Tue, 03/24/2015 - 8:20pm

Hi *,

It's time for the GHC weekly news. We've had an absence of the last one, mostly due to a lot of hustle to try and get 7.10 out the door (more on that shortly throughout this post). But now we're back and everything seems to be taken care of.

This week, in the wake of the GHC 7.10 release (which is occuring EOD, hopefully), GHC HQ met up for a brief chat and caught up:

  • This week GHC HQ met for only a very short time to discuss the pending release - it looks like all the blocking bugs have been fixed, and we've got everything triaged appropriately. You'll hopefully see the 7.10 announcement shortly after reading this.

We've also had small amounts of list activity (the past 6-8 weeks have been very, very quiet it seems):

  • Yitzchak Gale revived a thread he started a while back, which puttered out: bootstrapping GHC 7.8 with GHC 7.10. The long and short of it is, it should just about work - although we still haven't committed to this policy, it looks like Yitz and some others are quite adamant about it. ​https://mail.haskell.org/pipermail/ghc-devs/2015-March/008531.html

Some noteworthy commits that went into ghc.git in the past two weeks include:

Closed tickets this past week include: #9122, #10099, #10081, #9886, #9722, #9619, #9920, #9691, #8976, #9873, #9541, #9619, #9799, #9823, #10156, #1820, #6079, #9056, #9963, #10164, #10138, #10166, #10115, #9921, #9873, #9956, #9609, #7191, #10165, #10011, #8379, #10177, #9261, #10176, #10151, #9839, #8078, #8727, #9849, #10146, #9194, #10158, #7788, #9554, #8550, #10079, #10139, #10180, #10181, #10170, #10186, #10038, #10164, and #8976.

Categories: Offsite Blogs

Haskell-related PhD Studentship at Kent

haskell-cafe - Tue, 03/24/2015 - 5:12pm
Dear Haskellers, I am seeking a PhD student on a Haskell-related project: http://www.cs.kent.ac.uk/research/studyingforaphd/phd-wang.html This would be a good opportunity for someone who is interested in applying Haskell ideas to real problems. The funding covers maintenance, EU student fees and research related expenses. But non-EU students are welcome to apply too. For Sep 2015 starting, the application deadline is 17 April 2015. If you are interested, please contact me at m.w.wang< at >kent.ac.uk<mailto:m.w.wang< at >kent.ac.uk>. Best wishes, Meng — Dr Meng Wang School of Computing University of Kent http://www.cs.kent.ac.uk/people/staff/mw516/ _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Categories: Offsite Discussion