News aggregator

Why does GHC always box polymorphic fields in datastructures?

Haskell on Reddit - Wed, 07/02/2014 - 3:20pm

Hello /haskell/, I was reading Johan Tibbe's talk on GHC performance and one thing baffled me, and I quote:

Polymorphic fields are always stored as pointer-to-thing, which increases memory usage and decreases cache locality. Compare:

data Tree a = Leaf | Bin a !(Tree a) !(Tree a) data IntTree = IntLeaf | IntBin {-# UNPACK #-} !Int !IntTree !IntTree

Specialized data types can be faster, but at the cost of code duplication. Benchmark your code and only use them if really needed.

It turns out that Haskell's polymorphic types are not a zero-cost abstraction, and it would be perfectly normal in a unityped language where generics can only be implemented through runtime checks, but can't Haskell do better than that? After all, GHC has type inference, it knows the types of everything at compile time, it can infer typeclass instances at compile time, making typeclass methods zero-overhead, etc. So seeing that a Tree Int for some reason cannot be represented the same as an IntTree is... baffling. Is there a fundamental reason for that, or is it just a wart peculiar to GHC?

P.S. I'm in no way saying that the performance cost is terrible or that we should all stop using parametric polymorphism, I'm just baffled as to possible reasons for this strange decision.

Thank you.

submitted by aicubierre
[link] [29 comments]
Categories: Incoming News

PhD studentship on interval computation in Haskell

General haskell list - Wed, 07/02/2014 - 11:50am
------------------------------------------------- PhD studentship - Interval Computation in Haskell ------------------------------------------------- * Applicants should have a strong background in real analysis and functional programming. * The closing date for applications is 18th July 2014. * The project will be supervised by Michal Konečný, Aston University. * The student will receive a 3-year studentship of £15,500/year. * UK/EU student's fee is covered, non-EU student's fee is £10,914 in 2014/2015. * The student will act as a teaching assistant for a distance learning course approx. 7h/week. * For more information, use the following links: o Description of the potential project topics[1] o AERN - a Haskell interval computation library[2] o Supervisor's home page[3] o Department research home page[4] o Details of the studentship and links to application forms[5] (The above advert is also available at http:/
Categories: Incoming News

How to get good performance when processing binary data with conduit + vector/bytestring?

Haskell on Reddit - Wed, 07/02/2014 - 10:53am

I've been playing around with some data compression algorithms. Basically, code that loads some data from disk and then transforms, analyzes, shuffles around, compresses etc. I thought conduits and vector / bytestring would be a natural representation for those algorithms in the Haskell world, but I'm having a very hard time producing code that is elegant and fast.

Conduit seems to have a very large overhead that makes yielding individual words between stages unacceptably slow. Here:

runResourceT $ CB.sourceFile "test.dat" =$ CC.concat $$ CL.fold (+) 0

Just adding up all the bytes in a file, basically. That already takes like .5s/MB, which is pretty horrendous. It seems clear that the way to get decent performance is to always await / yield some chunk of data.

The first issue I have with this is that it renders many useful combinators unusable. For instance, I was hoping to use the conduitVector combinator to turn the input file into 1MB chunks of vector data for a blocksorting transform, but that seems impractical knowing how conduit performs with streams of singular Word8s.

Further, I struggle with actually outputting those bytestring chunks. Imagine a conduit performing some transformation, like run length coding or huffman compression. You read data from the input, and sometimes write a byte or more to the output. Just a buffer that fills till we can yield a sizeable chunk.

Haskell, to my knowledge, lacks a structure like an std::vector with amortized O(1) append and the compact storage of an array. We could either use a mutable vector for the chunk and progressively fill it, but then we're faced with the problem of efficiently converting a vector back into a bytestring. While possible, it's a bit of a low-level mess and there is no direct support for it.

There are no mutable / growing bytestrings (sure, I know why), and the best construct we have seems to be a bytestring builder. It seems fairly wasteful to build up mconcat'enated chunks of builders, but I gave that a shot. Neither brief, nor simple, but here is what I came up with:

type OutputBS = (Int, Int, BB.Builder) emptyOutputBS :: Int -> OutputBS emptyOutputBS chunkSize = (chunkSize, 0, mempty) outputByte :: Monad m => Word8 -> OutputBS -> forall i. ConduitM i B.ByteString m OutputBS outputByte w8 obs@(chunkSize, numBytes, _) | numBytes >= chunkSize = flushOutputBS obs >> addW8 (emptyOutputBS chunkSize) | otherwise = addW8 obs where addW8 (chunkSize', numBytes', builder') = return (chunkSize', numBytes' + 1, builder' <> BB.word8 w8) flushOutputBS :: Monad m => OutputBS -> forall i. ConduitM i B.ByteString m () flushOutputBS (_, numBytes, builder) | numBytes > 0 = yield (BL.toStrict $ BB.toLazyByteString builder) | otherwise = return () processAndChunkOutput :: Monad m => Conduit B.ByteString m B.ByteString processAndChunkOutput = flip evalStateT (emptyOutputBS 65536) loop where loop = (lift await) >>= \case Nothing -> get >>= lift . flushOutputBS Just bs -> do forM_ [0..B.length bs - 1] $ \i -> outputByteState (bs `B.index` i) loop outputByteState w8 = get >>= (\obs -> lift $ outputByte w8 obs) >>= put

This works as expected, but the performance is also at .5s/MB. Replacing the builder with a simple list that's reversed before being packed into a bytestring in the end is ~40% or so faster, but still too slow.

Looking further down the road for my compression code, I see more potential issues with this approach. Like if I want to use repa for a wavelet transform or FFT or so at a stage, again having to convert between vector and bytestring.

Can anybody recommend a way to speed up this conduit pipeline? Do you think this is in general a sound way of structuring the input, output, analysis and transformation stages of a compression program?

Thanks!

submitted by SirRockALot1
[link] [30 comments]
Categories: Incoming News

Closed type families, apartness, and occurs check

glasgow-user - Wed, 07/02/2014 - 9:19am
From the user manual, it sounds like a clause of a closed type family should be rejected once no subsitution of the type could make it unify with the clause. If so, it doesn't seem to do an occurs check: type family IsEq a b :: Bool where   IsEq a a = True   IsEq a b = False forall a . IsEq a a :: Bool = forall (a :: k). 'True forall a . IsEq a [a] :: Bool = forall a. IsEq a [a] I came across this while trying to using Generics to find the immediate children of a term - this sort of non-reduction happens while comparing a type like (Term var) with a constructor argument of type var. Brandon _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users< at >haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Categories: Offsite Discussion

Help installing GHC

Haskell on Reddit - Wed, 07/02/2014 - 8:29am

I'm having a problem installing GHC on my device running Apple ARM Darwin (it's a jailbroken device). I already have gcc (compiler) so I was hoping to build a version of ghc for device.

When I do 'distrib/hc-build' I get an error saying can't workout build platform.

Does anybody know how to fix this or install ghc on ARM Darwin?

http://i.imgur.com/obl031r.jpg

submitted by H1DD3NT3CH
[link] [6 comments]
Categories: Incoming News

Request for code review: Anagrammer (first Haskell program)

Haskell on Reddit - Wed, 07/02/2014 - 7:42am

I'm working on a Scrabble AI, and this is my implementation of a Trie: https://github.com/int3/scrabble-solver/blob/53f5774453cc4f218117051f7075923455370754/Trie.hs

I spent quite a while trying to see if it was possible to rewrite findConstraints and findAnagrams as folds on the trie, but I couldn't wrap my head around it. Do you think it is possible / reasonable to attempt this, or does the algorithm not fit the mold of a fold?

I'm also interested in comments on general code quality!

submitted by int3_
[link] [2 comments]
Categories: Incoming News

Dead-simple graphing of custom tree-like data types

Haskell on Reddit - Wed, 07/02/2014 - 3:04am

Hi, there! I just wrote a package (link) that's a wrapper around the boilerplate necessary to graph data structures; if you've written up any data structures that are similar to trees, this should make it extremely straightforward to generate .png files for the current state of your ADT.

Though the purpose of this post is to make it easier for everybody to graph their stuff, unsolicited feedback is more than welcome, should you be so inclined ;)

submitted by gravityman-142857
[link] [23 comments]
Categories: Incoming News

Ketil Malde: Expected site information from SNPs

Planet Haskell - Wed, 07/02/2014 - 2:00am

Lately, I’ve been working on selecting SNPs, the main goal is often to classify individuals as belonging to some specific population. For instance, we might like to genotype a salmon to see if it is from the local population or an escapee from a sea farm, or perhaps a migrant from a neighboring river? And if it’s an escapee, we might want to know which farm it escaped from. In short, we want to find SNPs that are diagnostic.

Typically, this is done by sequening pools of individuals, mapping the reads to the reference genome, identifying variant positions, and ranking them - typically using FST, sometimes also using p-values for the confidence in an actual allele difference, and maybe filtering on sequencing coverage and base- or mapping quality. However, FST really isn’t a suitable tool for this purpose. I’m therefore proposing the following. Let me know if it makes sense or not.

Expected Site Information

For diagnostic SNP, what we really would like to know is the amount of information observing each site contributes. Using Bayes theorem, observing an allele a in some individual N, gives us the following posterior probability for N belonging to some population A, where the allele frequency, P(aA), is known:

P(A|a) = P(a|A)P(A)/P(a)

Here, P(A) is our prior probability of N belonging to A, which after observing a is modified by a factor of

P(a|A)/P(a)

In order to assign N to one of several populations (either (A) or B, say), we are interested in the relative probabilities for the two hypotheses. In other words, we would like the odds for N belonging to one population or the other. Given the probabilities of P(aA) and (P(a|B)), and initial odds (P(A)/P(B)), we get

P(A|a)/P(B|a) = [P(a|A)P(A)/P(a)]/[P(a|B)P(B)/P(a)]

Canceling out P(a), we find that the prior odds are modified by:

P(a|A)/P(a|B)

That is, the ratio of this allele’s frequencies in each of the populations. For practical reasons, it is common to take the logarithm of the odds. This gives us scores that are additive and symmetric (so that switching the two populations gives us the same score with the opposite sign). Specifically, base two logarithms will give us the score in bits.

When observing a site, we may of course also encounter the alternative allele. By the same reasoning as above, we find that this allele modifies the odds by

[1-P(a|A)]/[1-P(a|B)]

Lacking any prior information, we can consider each population equally likely, and the likelihood of observing a particular allele is the average of the likelihood in each population. The information gain from each possible allele is then averaged, weighted by this average likelihood. For a biallelic site with major allele frequencies p and (q) (and consequentially, minor allele frequencies of 1 − p and (1-q)) in the two populations, the expected added information from the site then becomes:

I(p,q) = |(p+q)/2 log_2(p/q)| + |(1-(p+q)/2)log_2((1-p)/(1-q)) |

Note that we are here only interested in the amount of information gained, regardless of which hypothesis it favors, and thus we take the absolute values. For a site with multiple alleles enumerated by i and with frequency vectors p and q in the two populations, this generalizes to the weighted sum of log2(pi / qi).

Unlike measures like FST, measures of I is additive (assuming independence between sites), so the information gained from observing mulitple sites is readily calculated. From observing the information gained from observing each site, we will also be able to compare different sets of sites, and e.g., compare the value of a single site with minor allele frequencies (MAF) of, say, 0.1 and 0.3 to two sites with MAF of 0.2 and 0.3.

It may also be instructive to compare this procedure to sequence alignment and position specific score matrices (PSSMs). In sequence alignment, a sequence of nucleotides or amino acids are scored by comparing its match to a target sequence to its match to some base model using log odds scores. The base model to compare against is often implicit (typically using sequences of random composition), but more elaborate models are also possible Similarly, position specific frequency matrices are often converted to position specific score matrices using log odds. Calculating the information value from a set of observed alleles is then analogous to scoring an “alignment” of the set of observed alleles to two different sets of allele frequencies.

Allele frequency confidence intervals

In order to apply the above method in practice, we need to measure the allele frequencies in the population. This is problematic for two reasons. First, we do not have precise knowledge of the allele frequencies, we can only estimate them from our sequenced sample. This introduces sampling bias. Second, the sequencing process introduces additional artifacts. For instance, sequencing errors often result in substitutions, which are observed as apparent alleles. In addition, sequences can be incorrectly mapped, contain contamination, the reference genome can contain collapsed repeats, and the chemistry of the sequencing process is usually also biased – for instance, coverage is often biased by GC content. These artifacts often give the false appearance of variant positions.

One challenge with calculating site information from sequencing data (as opposed to using allele frequencies directly), is that such errors in the data can vastly overestimate the information content. For instance, an allele that appears to be fixed in one population means that any other observed allele will assign the individual to the alternative population - regardless of any other alleles. It is easy to see that an allele frequency of zero results in the odds going either to zero or infinity, and thus the log odds will go to either positive or negative infinity.

For diagnostic SNP discovery, it is more important to ensure that identified SNPs are informative, than to precisely estimate the information content. Thus, we take a conservative approach and use upper and lower limits for the allele frequencies by calculating confidence intervals using the method by Agresti-Coull. In addition, the limits are also adjusted by a factor ε, corresponding to sequencing error rate.

Software implementation

I’ve implemented this (that is, the conservative measure) as described above in a software tool called varan. It parses sequence alignments in the standard “mpileup” format as output by the samtools mpileup command. It can currently output several different statistics and estimators, including expected site information. This is a work in progress, so please get in touch if you wish to try it out.

Categories: Offsite Blogs

Evaluate and compare two expressions

Haskell on Reddit - Wed, 07/02/2014 - 1:47am

import Test.HUnit (runTestTT,Test(TestLabel,TestList),(~?)) import Data.Function (on)

-- | A very simple data type for expressions.

data Expr = Const Int | Add Expr Expr deriving Show

-- | 'Expression' is an instance of 'Num'. You will get warnings because -- many required methods are not implemented.

instance Num Expr where fromInteger = Const . fromInteger (+) = Add

-- | Equality of 'Expr's modulo associativity.

instance Eq Expr where (==) = error "Not yet implementd: (==)"

-- | A test expression.

testexpression1 :: Expr testexpression1 = 3 + (4 + 5)

-- | A test expression.

testexpression2 :: Expr testexpression2 = (3 + 4) + 5

-- | A test expression.

testexpression3 :: Expr testexpression3 = 2 + 5 + 5

_________________________-

testexpression1 should be equal to testexpression2. And they should ne NOT equal to testexpression3.

Yes, it's homework, so I'm searching for hints, no solutions! :)

Thanks in advance.

submitted by fuuman1
[link] [7 comments]
Categories: Incoming News

Expressing "complexity" of higher kinds/type classes?

Haskell on Reddit - Tue, 07/01/2014 - 9:11pm

Hi there! The StackOverflow answer here talks about monads in terms of "Turing-complete compositions", with a good bit of debate in its comments about whether that's a meaningful statement.

Since it seems so natural to talk about, e.g., Functor-Applicative-Monad in terms of relative "power", I wonder if a formalization of that "power" in terms of computational complexity exists (possibly, e.g., a way towards showing whether a given kind allows for "Turing-completeness" in some way). Are there any references around for this kind of work?

Apologies if this isn't clear; because the information surrounding it is so scarce, I'm having a little trouble trying to pose the question clearly.

submitted by babblingbree
[link] [14 comments]
Categories: Incoming News

wren gayle romano: Haskell Planet

Planet Haskell - Tue, 07/01/2014 - 6:44pm

Hello all,

This is just a short blurb to announce that I'm changing it so that Haskell Planet only picks up on posts tagged with haskell planet. For the next while here, I'll probably be writing about social justice, gender theory, and personal topics more often than Haskell and theoretical mathematics; so I don't want to flood the Planet with too much off-topic material.

Worry not, the pendulum will swing back again— it always does. But lately, this is where my thoughts have been focused.



comments
Categories: Offsite Blogs

use of -Werror

haskell-cafe - Tue, 07/01/2014 - 6:25pm
Is there a consensus within the Haskell community on the use of -Werror in packages uploaded to Cabal? Most places where I've worked, building with -Werror has been considered a laudable ideal, but fixing all the old warnings in the codebase was too much work, so it wasn't done. I suppose -Werror could cause problems in heterogeneous build environments.
Categories: Offsite Discussion

Twan van Laarhoven: Dependent equality with the interval

Planet Haskell - Tue, 07/01/2014 - 4:56pm

Here is a way to represent heterogeneous or dependent equalities, based on an interval type. In Homotopy Type Theory the interval is usually presented as a Higher Inductive Type with two constructors and a path between them. Here I will just give the two constructors, the path is implicit

data I : Set where i₁ : I i₂ : I -- there is usually a path, i-edge : i₁ ≡ i₂

The eliminator is

i-elim : ∀ {a} {A : I → Set a} → (x₁ : A i₁) → (x₂ : A i₂) → (Eq A x₁ x₂) → (i : I) → A i i-elim x₁ x₂ eq i₁ = x₁ i-elim x₁ x₂ eq i₂ = x₂

Here the type Eq is the dependent equality, which has type

Eq : ∀ {a} (A : I → Set a) → (x₁ : A i₁) → (x₂ : A i₂) → Set a

so we take a type parametrized by an interval, and two values of that type at the two endpoints of this interval. We can also define "heterogeneous reflexivity", a generalization of the usual refl function:

refl : ∀ {a} {A : I → Set a} → (x : (i : I) → A i) → Eq A (x i₁) (x i₂)

This function can be used to extract the third part of i-elim, with the reduction

refl (i-elim x₁ x₂ eq) = eq

I believe this can be used as the basis for an observational type theory, where Eq A and refl x reduce. The above is the first case for refl, the rest is "just" tedious structural recursion such as

Eq (\i → A i × B i) x y = Eq A (proj₁ x) (proj₁ y) × Eq B (proj₂ x) (proj₂ y) refl (\i → x i , y i) = refl x , refl y

and

Eq (\i → A i → B i) f g = {x : A i₁} → {y : A i₂} → Eq A x y → Eq B (f x) (g y) refl (\i → \(x : A i) → f i x) = \{x} {y} xy → refl (\i → f i (i-elim x y xy i))

or we can actually use the dependent equality and be more general

Eq (\i → Σ (x₁ : A i) (B i x₁)) x y = Σ (x₁y₁ : Eq A (proj₁ x) (proj₁ y)) (Eq (\i → B i (i-elim (proj₁ x) (proj₁ y) x₁y₁ i)) (proj₂ x) (proj₂ y)) Eq (\i → (x : A i) → B i) f g = {x : A i₁} → {y : A i₂} → (xy : Eq A x y) → Eq (\i → B i (i-elim x y xy i)) (f x) (g y)

Of course there is a lot more to it, but that is not the subject of this post.

As a final remark: if you are not too touchy about typing, then refl could even be implemented with the path i-edge between i₁ and i₂

i-edge : Eq (\_ → I) i₁ i₂ i-elim x₁ x₂ eq i-edge = eq refl foo = foo i-edge

But I'd rather not do that.

Categories: Offsite Blogs

[learn] Help explaining a strange use of foldr

Haskell on Reddit - Tue, 07/01/2014 - 2:25pm

How does this function work?

foo :: (a -> b -> a) -> a -> [b] -> a foo f a bs = foldr (\b g x -> g (f x b)) id bs a

From what I understand, foldr normally takes 3 parameters, and the first parameter is normally a binary function. In the function foo, foldr seems to have 4 parameters instead of 3, and the first parameter seems to be a ternary function.

Suppose I guessed from here that foo is an implementation of foldl. How would I prove it?

submitted by ronguida
[link] [5 comments]
Categories: Incoming News

Chris Smith: CodeWorld Rises Again!

Planet Haskell - Tue, 07/01/2014 - 2:01pm

About three years ago, I started work on an idea about technology-based math education.  The idea was to get middle school students to work passionately on using mathematics to create things, by:

  1. Doing their own original, creative work, instead of following instructions or reaching set answers.
  2. Getting instant feedback 24 hours a day, so they can tinker and learn in a self-directed way.
  3. Building confidence by working on their own ideas, inspiring pride and excitement.
  4. Experiencing how concepts from geometry, algebra, and physics can be springboards for creativity.
  5. Becoming creators, rather than just consumers, of technology.

That’s a lofty set of goals, but it was very successful.  In the 2011-2012 school year, I taught a small class of six students, two to three hours per week.  We had an awesome time.  They built their own computer games throughout the year.  We struggled together, worked our way through, and finished the school year with an awesome expo where the students showed off their work to local technology professionals and participated in a question-and-answer panel about their experiences.  It was fascinating listening to this, because a few patterns arose:

  • Students didn’t really think of what they were doing as math.  This remained true, even when the skills they learned involved describing the behavior of systems using equations, functions, and variables; describing complex shapes in terms of geometry, the coordinate plane, and rotations, translations, and scaling; coming to grips with the meaning of probability and randomness; etc.
  • The students who entered the year being “good at technology” weren’t necessarily the most likely to succeed.  Talking to these students broke all of the stereotypical molds about computers and technology!  Students took to the activity and wildly succeeded were very often girls, and had previously thought they were more the art-and-music type.

At the end of that year, I had plans to teach this program in multiple schools the following school year.  Unfortunately, things then got a little sidetracked.  I started a new job at Google over the summer, moved to California, and dropped the program.  The web site that students had used to build their projects fell into disrepair, and stopped working.  I stopped doing anything about it.

Over the last week and a half, though, that’s changed!  CodeWorld is back!

Getting Started

The CodeWorld web site is (as always) at http://www.codeworld.info.

Any web browser will do, but you really need to use the latest version of whatever browser you choose.  If you’ve been putting off upgrading Internet Explorer, it’s long past time!

You’ll also want a Google account.  You can log in using your Google account, and save your programs to Google Drive.  Because your programs are saved to the cloud, you can use the web site from any computer you like, even computer labs in a school, and your programs will follow where ever you go.

Using the web site is simple.  Type your program on the left.  Click Run to see it work on the right.  You can sign in to open your existing projects and save your projects.  You can also get links to share your projects with others.  There are sample projects along the bottom of the screen, including Yo Grandma!, a game written by Sophia, one of my students from the original class.

Unfortunately, instructions on how to write the programs are still mostly missing.  If you already know the language, a link to the generated documentation might help.  Otherwise, hold on!  Once the programming environment is stable, I plan to put together a comprehensive progression of exercises, tutorials, and examples.

Behind the Scenes

Under the hood, I mostly recreated this from scratch, throwing away most of the original project from a few years ago.  This new version of the environment has a lot of advantages: it runs your programs on your own computer, so your program runs a lot faster.  It’s less restrictive.  And I completely customized the language to make a lot of things simpler and easier to understand.

Changes:

  • The programming language for CodeWorld is called Haskell.  Haskell is an awesomely mathematical language, but parts of it are also notoriously complex.  The new incarnation of CodeWorld still uses Haskell, but goes a lot further to hide the rough edges.  In particular, you’ll rarely see any classes, and there’s an obvious type for most things (e.g., all text has the type Text, and all numbers have the type Number.)
  • Previously, CodeWorld was based on a library called Gloss for the Haskell programming language.  Gloss is great, and I saved as many ideas from it as I could.  But CodeWorld is now its own library.  This let me clean up some terminology, align the meaning of programs more closely with the goals of algebraic thinking and math concepts, and work with the simplified version of the language.
  • The biggest change to how the web site works is that your programs now run on your own computer, instead of on the web server.  This is using an awesome project called GHCJS, which converts the Haskell program into JavaScript, which is understood by web browsers.

I’ll try to keep posting here as I have learning material ready to use with this tool.  Stay tuned!


Categories: Offsite Blogs

Chris Smith: Big changes coming to CodeWorld

Planet Haskell - Tue, 07/01/2014 - 2:01pm

I’m continuing work on CodeWorld, my educational programming environment based on geometry and algebra.  There are big changes coming!  If you’re interested in following the project, please join the new codeworld-discuss mailing list, where I’ll send more regular announcements about significant changes, as well as try to answer questions, and discuss future directions.

Here are some things I intend to change in the near future.  A more complete list is on the project issue tracker, but this is a summary with more details and reasoning about some of the changes.

Aligning With Math Education

An important goal of this project is to align with a standards-based U.S. middle school math education, as much as possible.  To be clear, I still refuse to add complexity or turn the project into a patchwork of specific lessons that promote a specific narrow path of learning.  First and foremost, this should be an environment for tinkering and encountering ideas in self-motivated way.  But given alternative designs that could each be valid on their own, I’ll choose the one that pushes students toward the math standards.

It’s sometimes a tough line to draw.  But I’ve become convinced that there are a few places where I can do better.  Two of those are going to be major breaking changes, coming soon.

1. Death to Currying

Haskell’s convention of currying functions is the wrong default for CodeWorld.  Practically all of mathematics, especially at introductory level, is carried out with the notation f(x,y) = … .  The interpretation is that a function of two parameters is a function whose domain is a product – a set of ordered pairs.  The Haskell language makes a different choice.  Applying a function to two parameters is more like f(x)(y) (the parentheses are optional in Haskell itself), and the interpretation is that f(x) denotes a partially applied function that’s still waiting for its second parameter.

If the goal were to teach about higher-order functions, there would be lots of great arguments for the latter.  If the goal were convenience, you could argue for the latter pretty persuasively, as well.  I think Haskell’s use of currying is great.  But when the goal is to let students encounter and tinker with things they will see in school math, the right choice is to adopt the convention of mathematics.

Luckily, the assumption of curried multi-parameter functions isn’t baked into Haskell too deeply.  By changing the standard library, it’s quite possible to write f(x,y) just as well.  The parentheses on f(x) become optional, but this is actually true of mathematics in general (for example, operators in linear algebra are often written without parentheses, as are trig functions).  I will adopt the convention of using parentheses around even single function parameters.

The only big source of awkwardness comes with binary operators.  So long as we choose not to teach the notations `foo` (for turning a function into an operator) or (+) (for turning an operator into a function), this doesn’t come up much.  Notably, sections still work fine, since they take only one argument.

A couple convenient side effects of this choice are nice, too:

  • Students who routinely write parentheses around function arguments less often find themselves forced to surround negative numbers in parentheses for weird parsing reasons.  As trivial as it might seem, this was a very real and significant learning obstacle the last time I taught the class, and I’ll be happy to see it go.
  • Getting expression structure wrong sometimes gives much better error messages this way.  It’s harder to accidentally mix up precedence between an operator and function application; and passing too few arguments to a function gives a clear error rather than inferring a function type and breaking in an obscure way elsewhere.
2. Resizing the Canvas

The second big change is to resize the canvas from 500×500 to 20×20.

The justification for a 500×500 canvas was generally about confusing pixels – little dots on the screen – with the general idea of a coordinate system.  It’s convenient to blur the distinction at first, but it has in the past become a barrier to understanding the full nature of the coordinate plane with real (or even rational) coordinates.  Many students were confused when later faced with fractional coordinates.  At the same time, developing a full understanding of the rational number system is a big topic in 6th, 7th, and 8th grade mathematics, so it would be great to ask students to do more tinkering with these numbers.

By replacing this with a 20×20 grid (x and y coordinates ranging from -10 to 10), several goals are accomplished:

  • Students early in the class are working with numbers in a range they can comprehend better.
  • Students routinely work in fractions or decimals to fine tune their projects.
  • The abstract coordinate plane, including fractional coordinates, becomes more familiar.

This is a big win overall.

Changes to Usability

On the less controversial side, I’m planning a number of changes to make the site more usable:

  • Pervasive auto-complete, based on a pre-populated list of the standard library symbols as well as parsing the student code for declared names.
  • More complete documentation, tutorials, and better examples.  I admit that the current site is grossly lacking in documentation.  I don’t envy anyone who tries to figure it out on their own!
  • Better tools for playing around with results.  At the very least, students will be given the chance to scroll, pan, and see coordinates of points in pictures, animations, and simulations.
Long-Term Wish List

I also have my wish list for things I’d love to see possible, but am not quite ready to build yet.  This includes:

  • Social features: sharing projects with friends, commenting on or expressing support for other projects.
  • Collaborative projects with shared editing or exporting libraries for others to use.
  • Better debugging tools, such as easy controls to move forward and back in time, fast-forward, pause, etc. for animations, simulations, and even games.
  • Possibly grading features for teachers to grade projects and provide a scoring rubric and comments.

What else would you like to see?  Let me know in the comments here, on codeworld-discuss, or by filing a feature request in the issue tracker.


Categories: Offsite Blogs

Haskell Beginner Stuck on Command-Line IO

Haskell on Reddit - Tue, 07/01/2014 - 10:03am

I am web developer who works primarily in Python and Javascript, and while often the discussion in this subreddit is over my head, I have found the small bits of Haskell that I have learned to be enlightening such that I have recently decided to start building stuff where I can in Haskell in order to advance my understanding of the language.

Anyway, apologies if this does not belong here, but I'm stuck on a problem and I can't seem to figure it out and it finally occurred to me to ask you guys for help.

Here's a gist of a module I built yesterday in order to pull and count IP addresses from Django Tracebacks.

I can use it on the REPL (GHCi) and get results that I expect in the following way:

unlines . map flatten . countUnique . ipOnly . spammers <$> readFile "tracebacks.txt"

However, I can't seem to figure out how to organize the main block. It compiles but does not seem to do anything with input or output (I was hoping to name an infile and outfile on the command-line). Someone here will probably see right away what I'm doing wrong.

In addition, I would be happy to take any other suggestions or critiques you might have of this small script. I crammed a lot of stuff in there in order to experiment with the language a bit, but I'm sure there are mistakes or poor assumptions I have made. In fact, I noticed a discussion today talking about using Data.Text instead of String (which I am relying on pretty heavily here), so I was thinking of rewriting this using Data.Text.

Thanks for any suggestions you might have.

EDIT: I updated the gist with a working main, but I realize now I should probably have kept it broken for people who come to this thread late. The first version can be seen in the revision history: https://gist.github.com/pellagic-puffbomb/324cf7f50f4480e8a8d1/revisions

submitted by erewok
[link] [18 comments]
Categories: Incoming News