To my delight, I still get compliments on and criticisms of my post from three years ago (can it possibly be that long?) on parallelism and concurrency. In that post I offered a “top down” argument to the effect that these are different abstractions with different goals: parallelism is about exploiting computational resources to maximize efficiency, concurrency is about non-deterministic composition of components in a system. Parallelism never introduces bugs (the semantics is identical to the sequential execution), but concurrency could be said to be the mother lode of all bugs (the semantics of a component changes drastically, without careful provision, when composed concurrently with other components). The two concepts just aren’t comparable, yet somehow the confusion between them persists. (Not everyone agrees with me on this distinction, but neither have I seen a rigorous analysis that shows them to be the same concept. Most complaints seem to be about my use of the words “parallelism” and “concurrency” , which is an unavoidable problem, or about my temerity in trying to define two somewhat ill-defined concepts, a criticism that I’ll just have to accept.)
I’ve recently gotten an inkling of why it might be that many people equate the two concepts (or see no point in distinguishing them). This post is an attempt to clear up what I perceive to be a common misunderstanding that seems to explain it. It’s hard for me to say whether it really is all that common of a misunderstanding, but it’s the impression I’ve gotten, so forgive me if I’m over-stressing an obvious point. In any case I’m going to try for a “bottom up” explanation that might make more sense to some people.
The issue is scheduling.
The naive view of parallelism is that it’s just talk for concurrency, because all you do when you’re programming in parallel is fork off some threads, and then do something with their results when they’re done. I’ve previously argued that this is the wrong way to think about parallelism (it’s really about cost), but let’s just let that pass. It’s unarguably true that a parallel computation does consist of a bunch of, well, parallel computations. So, the argument goes, it’s nothing but concurrency. I’ve previously argued that that’s not a good way to think about concurrency either, but we’ll let that pass too. So, the story goes, concurrency and parallelism are synonymous, and bullshitters like me are just trying to confuse people and make trouble.
Being the troublemaker that I am, my response is, predictably, no, just no. Sure, it’s kinda sorta right, as I’ve already acknowledged, but not really, and here’s why: scheduling as you learned about it in OS class (for example) is an altogether different thing than scheduling for parallelism. And this is the heart of the matter, from a “bottom-up” perspective.
There are two aspects of OS-like scheduling that I think are relevant here. First, it is non-deterministic, and second, it is competitive. Non-deterministic, because you have little or no control over what runs when or for how long. A beast like the Linux scheduler is controlled by a zillion “voodoo parameters” (a turn of phrase borrowed from my queueing theory colleague, Mor Harchol-Balter), and who the hell knows what is going to happen to your poor threads once they’re in its clutches. Second, and more importantly, an OS-like scheduler is allocating resources competitively. You’ve got your threads, I’ve got my threads, and we both want ours to get run as soon as possible. We’ll even pay for the privilege (priorities) if necessary. The scheduler, and the queueing theory behind it (he says optimistically) is designed to optimize resource usage on a competitive basis, taking account of quality of service guarantees purchased by the participants. It does not matter whether there is one processor or one thousand processors, the schedule is unpredictable. That’s what makes concurrent programming hard: you have to program against all possible schedules. And that’s why you can’t prove much about the time or space complexity of your program when it’s implemented concurrently.
Parallel scheduling is a whole ‘nother ball of wax. It is (usually, but not necessarily) deterministic, so that you can prove bounds on its efficiency (Brent-type theorems, as I discussed in my previous post and in PFPL). And, more importantly, it is cooperative in the sense that all threads are working together for the same computation towards the same ends. The threads are scheduled so as to get the job (there’s only one) done as quickly and as efficiently as possible. Deterministic schedulers for parallelism are the most common, because they are the easiest to analyze with respect to their time and space bounds. Greedy schedulers, which guarantee to maximize use of available processors, never leaving any idle when there is work to be done, form an important class for which the simple form of Brent’s Theorem is obvious.
Many deterministic greedy scheduling algorithms are known, of which I will mention p-DFS and p-BFS, which do p-at-a-time depth- and breadth-first search of the dependency graph, and various forms of work-stealing schedulers, pioneered by Charles Leiserson at MIT. (Incidentally, if you don’t already know what p-DFS or p-BFS are, I’ll warn you that they are a little trickier than they sound. In particular p-DFS uses a data structure that is sort of like a stack but is not a stack.) These differ significantly in their time bounds (for example, work stealing usually involves expectation over a random variable, whereas the depth- and breadth-first traversals do not), and differ dramatically in their space complexity. For example, p-BFS is absolutely dreadful in its space complexity. For a full discussion of these issues in parallel scheduling, I recommend Dan Spoonhower’s PhD Dissertation. (His semantic profiling diagrams are amazingly beautiful and informative!)
So here’s the thing: when you’re programming in parallel, you don’t just throw some threads at some non-deterministic competitive scheduler. Rather, you generate an implicit dependency graph that a cooperative scheduler uses to maximize efficiency, end-to-end. At the high level you do an asymptotic cost analysis without considering platform parameters such as the number of processors or the nature of the interconnect. At the low level the implementation has to validate that cost analysis by using clever techniques to ensure that, once the platform parameters are known, maximum use is made of the computational resources to get your job done for you as fast as possible. Not only are there no bugs introduced by the mere fact of being scheduled in parallel, but even better, you can prove a theorem that tells you how fast your program is going to run on a real platform. Now how cool is that?
Filed under: Programming, Research Tagged: concurrency, parallelism
If one has an idea or initiative that could help growing Haskell's popularity, should he think twice about doing it, or is that acceptable now (that it seems inevitable, anyway)?submitted by SrPeixinho
[link] [7 comments]
ESo when i found haskell i slingshotted off through dependent and substructural types. Assuming that if a little was good a lot was better. Made it half way through TaPL and found pure type systems, coq, etc.I think the power to weight ratio isn’t there. I find that Haskell gives amazingly expressive types that have amazingpower for the amount of code you tie up in them and that are very resistant to refactoring.If i write agda and refactor I scrap and rewrite everything. If i write haskell, and get my tricky logic bits right?I can refactor it, split things up into classes, play all the squishy software engineering games to get a nice API I want. And in the end if it still compiles I can trust I didn’t screw up the refactoring with a very high degree of assurance.
CAdmittedly I’m not playing at the level E is, but this was my experience. I can make sweeping changes to my API, get all the bugs caught by the type system, and still have minimal code impact.
BThat is what I was getting at with the tweet about not using dynamically typed langs because I need to be able to prototype quickly and get rapid feedback.I think a lot of my friends thought i was just being trollish. Even just being able to see what would have to change if you changed your design slightly and being able to back it out quickly…
I'm a self taught Django user since forever, and I've been dabbling in Haskell for the past year or so, having gotten to the point where I can try to get into a bigger project in Haskell. I've read into Yesod, but I've done no actual work except for the occasional tutorial project.
I've no doubt in my mind that I would chose Haskell over Python for almost any kind of project, except for a web project.
However, what I find lacking in the Yesod environment are plugins. There's a huge Django ecosystem, with open-source plugins that do mostly anything from facebook integration to tagging to payment engines to you-name-it.
What are the Haskell community's thoughts about this state of affairs? Are there any Yesod resources I might have overlooked?submitted by pashadia
[link] [14 comments]
Is there any facility in Haskell for automagically deciding which computations should be eager and which lazy?
I'm not exactly sure how it would work under the hood, but I'm imagining a library that keeps statistics on how often an expression is actually evaluated and eagerly (and in parallel) evaluates the ones with a high enough chance of future use. Kind of like branch prediction in CPUs, except at a higher level. Does any such library or language feature exist? Does the concept even make sense?submitted by amoralnihilist
[link] [17 comments]
What if we were to make a game in Haskell?
Specifically, a game that has very strong modding support, and a pretty fun concept. Minecraft and the Source engine games have had a lot of success with modding, and a ton of people have had to learn the modding language in order to work on it, and have had the incentive to. If we make this game modded with haskell, and if its successful, that would be an insane boost to the haskell community. It could even have a branching effect, where modders who learnt haskell for this game go and use it to make their own games, for that reason making the game open source would be a very good idea (a guideline for these modders to help conceptualize the structure of a bigger project than their mods.)
What do you think? Its undoubtedly a large project, and coming up with an idea is very tough, but if enough community members got on board i think itd be very doable and very fun to work on.
I also have an idea for a game that could be made. It might be scummy of us and im not sure what the legality of this is but the game Cube World is seriously loved and seriously poipular and only failed because its underdeveloped, and hard to mod. If we went the other way with that i doubt wed have a hard time finding success.
This might also be unrealistic... i know games are massive projects and I dont think the open source community has ever banded together to work on something with almost no purpose besides entertainment.submitted by fruitbooploops
[link] [55 comments]
So, today's question on @1HaskellADay was this:
write a function countOccurences :: [Stirng] -> Map Char Int
(typos faithfully reproduced)
lookup 'l' $ countOccurences "Hello" ~> Just 2
lookup 'q' $ countOccurences "Hello" ~> Nothing
Okay, that can be done easily enough, I suppose, by torquing Map into something that it isn't, so one gets wrapped around the axel of creating a mapping from characters to occurrences.
First of all, countOccurences maps a String (not a List of Strings) to a Map, and that map is a very specialized kind of map that has existed in the literature for quite a while, and that map is known as the Bag data type, and is also, nowadays, called the MultiSet by people too embarrassed to say the word 'bag' in a sentence, because of their prior drug convictions.
("I got two months for selling a dime bag.")
So they now twist the word 'Set' (a 'collection of unique objects') to mean something that's not a set at all, the 'Multi'Set, which is a 'collection of unique objects, but you can have multiples of these unique objects, so they're not unique at all, so it isn't a set at all, but we need to say the word 'set' because we can't say the word 'bag' because saying the word 'bag' would make us sound plebeian for some reason.'
Yeah, that. 'MultiSet.'
What. Ev. Er.
But I digress.
So I COULD write the countOccurences as a String -> Map Char Int function, but then: why bother? You can either write tons of algorithmic code that obscures the intent or just simply use the appropriate data type.
I went for the latter.
Now, I wuz gonna do a dependently-typed pair to represent an occurrence...
... notice how countOccurences is so badly misspelled, by the way?
SOMEbody didn't QA-check their problem for the day today, I'm thinking.
... but then I said: 'eh!'
I mean: WHY is lookup 'q' $ countOccurences "Hello" ~> Nothing?
WHY can't it be that count 'q' for a Bag Char representation of "Hello" be 0? 0 is a valid answer and it keeps everything nice and monoidal without having to lift everything unnecessarily into the monadic domain.
So, yeah. Let's do that, instead.
So, here we go, and in Idris, because that's how I'm rolling these days. The advantages of dependent types have been enumerated elsewhere, so we'll just go with that they're better as an assumption and move on, using them, instead of extolling them, in this post.
So, my first attempt at Bag crashed and burned, because I did this:
data Bag : (x : Type) -> Type where
add : Bag x -> x -> Bag x
emptyBag : Bag x
and the compiler was fine with that. Hey, I can declare any type I'd like, so long as the types just stay as types, but as soon as I tried to define these things:
emptyList : List x
emptyList = 
emptyBag = Bag emptyList
add (Bag ) x = Bag [(x, 1)]
add (Bag ((x, y) :: rest)) x = Bag ((x, y + 1) :: rest)
add (Bag ((z, y) :: rest)) x = Bag ((z, y) :: (add rest x))
The compiler looked at me and asked: 'geophf, what in tarnation are you-ah tryin' to do?'
And about the only intelligent answer I could muster was: 'Ummmm... idk.'
I had gotten too clever for myself by half, trying to reshape a data type you learn in Comp.Sci. 101 as a purely functional type.
Back to Basics ... (but not BASIC)
So, let's just declare Bag to be what it is and KISS: 'keep it simple, stupid!'
data Bag x = Air | Stuffed (x, Nat) (Bag x)
Now, I so totally could've gone with the balanced binary-tree representation instead of the simple and standard linked list, but, you know: 'live and learn!'
With this declaration the emptyBag becomes so trivial as to be unnecessary, and then add is simplicity, itself, too, but add is, either way, so that's not saying much.
add : Eq x => Bag x -> x -> Bag x
add Air x = Stuffed (x, 1) Air
add (Stuffed (z, y) rest) x =
case x == z of
True => Stuffed (x, y + 1) rest
False => Stuffed (z, y) (add rest x)
Now, you see me relying on the case-statement, here. Unhappily.
I'd like my dependent types to say, 'unify x with x (reflexive) for the isomorphic case, and don't unify x with z for the other case.' But we're not there yet, or my coding isn't on par with being there yet, so I forced total coverage bifurcating the result-set into isomorphic and not with a hard-case statement.
Ick. I hate explicit case-statements! Where is really, really, really smart pattern-matching when I need it?
But with add, constructing a Bag becomes easy, and then counting elements of that bag is easy, too (again, with another case-statement, sigh!):
count : Eq x => x -> Bag x -> Nat
count _ Air = 0
count x (Stuffed (z, y) rest) =
case x == z of
True => y
False => count x rest
countOccurences (with one-too-few 'r's in the function name) becomes easy, given the Bag data type:
countOccurences : String -> Bag Char
countOccurences str = co' (unpack str) where
co'  = Air
co' (char :: rest) = add (co' rest) char
But look at this:
depth : Bag x -> Nat
depth Air = 0
depth (Stuffed _ rest) = 1 + depth rest
sample : ?bag
sample = countOccurences "The quick, brown fox jumped over the lazy dog."
bag = proof search
When we do a depth sample, we get the not-surprising answer of 29 : Nat
Perhaps this could be made a tad bit more efficient?
Well, then, let's do that!
data Bag x = Air | Stuffed (x, Nat) (Bag x) (Bag x)
We make Bag balanced, with the add-function, doing the work of (very simply) branching off new nodes:
add : Ord x => Bag x -> x -> Bag x
add Air x = Stuffed (x, 1) Air Air
add (Stuffed (z, y) less more) x =
case (compare x z) of
LT => Stuffed (z, y) (add less x) more
GT => Stuffed (z, y) less (add more x)
EQ => Stuffed (z, y + 1) less more
Then all the other functions change ('morph') to work with a tree, not a list and work with Ord elements, not with (simply) Eq ones.
And so, the redefined depth-function gives a very different result:
depth sample ~> 9 : Nat
Not bad! Not bad! The improved data-structure improves efficiency across the board from O(N) to O(log N).
Hm, perhaps I'll have count return a dependently-typed pair, just as the library function filter does on List types, but not tonight.
Good night, Moon!
Some Haskell types are isomorphic, which means we have functions (A and B are the isomorphic types)a_to_b :: A -> B b_to_a :: B -> A
which satisfy the lawsa_to_b . b_to_a = id :: A -> A b_to_a . a_to_b = id :: B -> B
A simple example would be the types (a,(b,c)) and ((a,b),c) with the functions beingassoc (a,p) = ((a, fst p), snd p) assoc' (p, c) = (fst p, (snd p, c)
What this means is that the types are essentially the same with the only difference being their representation. This made me wonder whether it would be a good thing to have the ability to use isomorphic types interchangably on the typelevel. By this I mean that one can declare types as isomorphic by providing the correct functions, the default type to use when one of them is encountered and the typechecker actually treats them as the same type (essentially by inserting the transformation function where needed).
The reason why I thought about this is the following:
Consider a function (<||>) :: (a -> b) -> (c -> d) -> ((a,c) -> (b,d)) which turns two functions into a function on pairs. Since (a,(b,c)) and ((a,b),c) are isomorphic (<||>) behaves like an associative function but one is not able to use it like that, since the types get in the way.
Not only that but since the types ((),a), (a,()) and a are also all isomorphic id| :: () -> () would satisfy the laws of the neutral element of (<||>) but again you can't really use it that way since the types get messed up.
Quick insert: Since I am describing something similar to a monoid and also similar to the category class, I just did a quick google search and it looks like I am describing what makes up a monoidal category. So I guess one thing which would speak for having something like this would be that you have less cumbersome category theoretic things.
So assume Haskell allows you to make declarations like this:isomorphic a (a,()) to a = (a,()) from (a,()) = a default a -- laws: to . from $ p == id p -- from . to $ a == id a
This means, that whenever the type (a,()) is encountered there is an implicit insertion of from changing the type to a allowing them to be used interchangably.
Would this be more of a good thing or more of a bad thing?submitted by Pseudoradius
[link] [24 comments]
On the BuildWrapper side, things are more complex. Adapting to API changes wasn't a big deal, but it seems that GHC bugs involving the GHC API, static linking and other unknowns cause some things to break. The solution I've found was to build BuildWrapper with the -dynamic flag. But I couldn't upload this to hackage because Cabal thinks that -dynamic is a debug flag (it starts with d). I've sent a bug fix to Cabal, so in the next release that'll be fixed. So if you're using GHC 7.8 and BuildWrapper, you may want to rebuild the executable with -dynamic (uncomment the relevant line in the cabal file).
Note: BuildWrapper comes with a comprehensive test suite (90 tests covering all aspects). So you can always build the tests and run them to ensure everyting is OK on your system.
Happy Haskell Hacking!