News aggregator

Request for review of a GADT tutorial draft

glasgow-user - Mon, 01/28/2013 - 8:40am
Dear Haskell community, I have recently written an introductory-level tutorial article about GADTs in GHC (inspired by LASER 2012 summer school and to be submitted to their proceedings). I have already send this draft to the "Haskell Cafe" mailing list, but I was also advised to use these mailing lists, so I would like to ask for your feedback: http://anton-dergunov.ru/publications/gadts_draft_v4.pdf Any opinion about this article and any suggestions are very welcome!
Categories: Offsite Discussion

How to get started with a new backend?

glasgow-user - Mon, 01/28/2013 - 3:15am
I would like to explore making a backend for .NET. I've done a lot of background reading about previous .NET and JVM attempts for Haskell. It seems like several folks have made significant progress in the past and, with the exception of UHC, I can't find any code around the internet from the previous efforts. I realize that in total it's a huge undertaking and codegen is only one of several significant hurdles to success. I would like to get a very, very, very simple translation working inside GHC. If all I can compile and run is fibonacci, then I would be quite happy. For my first attempt, proof of concept is sufficient. I found a lot of good documentation on the ghc trac for how the compilation phases work and what happens in the different parts of the backend. The documentation is excellent, especially compared to other compilers I've looked at. When I started looking at how to write the code, I started to wonder about the "least effort" path to getting something (anything?) working. Here are some ques
Categories: Offsite Discussion

Edward Z. Yang: The GHC scheduler

Planet Haskell - Mon, 01/28/2013 - 2:00am

I’d like to talk about some nitty-gritty details of GHC’s thread scheduling, discovered over the course of working on stride scheduling for GHC. Most of these choices are merely implementation details and are not part of any specification. While these choices shouldn’t be relied upon, they are worth knowing, since many of these details were accreted over the course of many performance bugs, benchmark tests and other battles. In this post, I’ll attempt to give some historical insight into why many choices were made. These insights should generalize to any system that would like to implement green threads, lightweight threads that use less memory than traditional operating system threads. For space reasons, I’m not going to talk about STM or sparks (though they are also quite interesting).

Update: A large portion of this material has been incorporated into the scheduler page in the GHC commentary

Anatomy of a thread

I’d first like to discuss some brief background about the runtime system first and point out some perhaps nonintuitive design choices. A thread is represented by a TSO (thread-state object) by GHC, i.e. the StgTSO struct in includes/rts/storage/TSO.h. [1] In Haskell, TSOs can be passed around as ThreadId objects. The Stg in front of the struct name indicates that TSOs are garbage collected, like other closures in Haskell. The TSO, along with the stack allocated with it (STACK), constitute the primary memory overhead of a thread. Default stack size, in particular, is controlled by the GC flag -ki, and is 1k by default. [2] Threads are run by Capabilities, which can be thought of virtual cores managed by GHC. Capabilities are, in turn, mapped to true operating system threads, or Tasks, though we won’t talk about them much.

Being garbage collected has two major implications for TSOs. First, TSOs are not GC roots, so they will get GC'd if there is nothing holding on to them (e.g. in the case of deadlock), and their space is not automatically reclaimed when they finish executing [3]. Usually, a TSO will be retained by a Capability’s run queue (a GC root), or in the list of waiting threads of some concurrency variable, e.g. an MVar. Second, a TSO must be considered a mutable object, and is thus subject to the conventional GC write barriers necessary for any mutable object in a generational garbage collector. [4] The dirty bit tracks whether or not a TSO has been modified; it is always set when a thread is run and also when any of the pointer fields on a TSO are modified. Two fields, set by setTSOLink and setTSOPrev, are of particular interest to the scheduler.

Run queue

The run queue is at the heart of the scheduler, as any runnable thread will hit the run queue before the scheduler actually pops it off the queue and runs it. There’s one per capability rts/Capability.h (in the bad old days, there was a global run queue, but this performed badly for multithreaded processes), and it is implemented as a doubly-linked list run_queue_hd and run_queue_tl. [6] The head and tail pointers mean that the queue is actually a deque: this is important because the scheduler will often have to handle threads that were interrupted in some way, and should let the threads get back on. The links themselves are on the TSOs and modified with setTSOLink and setTSOPrev, so modifying the queue dirties the TSOs involved. [7] Otherwise, the run queue is exclusively owned by the scheduler. If there are idle capabilities and if we have more than one thread left in our run queue, threads will be pushed to other queues with schedulePushWork.

Threads are put in front (pushOnRunQueue) if:

  • A stack overflow occurs;
  • A heap overflow occurs; [8]
  • A task attempts to run a thread, but it is bound and the current task is the wrong one;
  • A thread is associated with a black hole (a thunk that is being evaluated), and another thread, possibly on another capability, has blocked on its evaluation (see ticket #3838);
  • In the threaded runtime, if a thread was interrupted because another Capability needed to do a stop-the-world GC (see commit 6d18141d8);
  • In the non-threaded runtime, when a thread waiting on IO unblocks.

Threads are put in back (appendToRunQueue) in the case of pre-emption, or if it’s new; particularly, if

  • A thread was pre-empted via the context switch flag (e.g. incoming message from another thread, the timer fired, the thread cooperatively yielded, etc; see also [8] on how this interacts with heap overflows);
  • It is a new thread (so large amounts of thread creation do not starve old threads, see conc004 and commit 05881ecab);
  • A thread becomes unblocked;
  • A thread is migrated to another capability (though, in this case, the queue was empty anyway);
  • A thread finishes, but for some reason we need to keep it around (this is related to in-calls, though I’m not a 100% sure what is going on here; if you know, please tell me!)
Benchmarks

Benchmarks like nofib are very important, even if they are synthetic, as they will often be construed as primary evidence whether or not a change to the scheduler speeds or slows things down. One reason is that it is much easier to tell why a short program that torture tests threads has slowed down than it is to tell why a large, complicated multithreaded program no longer seems very snappy. But really, the main motivation is convenience: nofib programs are easy to measure and easy to compare. Fortunately, the tests often measure something quite specific, so I’d like to describe the tests that compose the smp nofib suite here:

  • callback001 (also known as ffi014) performs a large number of incalls to Haskell from C from a large number of threads. This is a rather specific test related to how we place threads in the run queue even if they’ve finished, if they finished in an in-call.
  • callback002 measures how quickly we can perform incalls to Haskell from C.
  • chan measures how scheduling order effects memory usage: if threads are allowed to run for a bit without getting context switched, they build up data in channels. This is related to when we reset the context switch flag (see [8]).
  • sieve implements the Sieve of Eratosthenes, spawning many threads to evaluate thunks of a lazy list in parallel. It performs a bit of allocation, and sensitive to what happens to threads after a HeapOverflow.
  • threads001 tests how quickly we can create a thread and then context switch to it.
  • threads003 tests how quickly many threads can communicate by reading and writing MVars. It is a bit sensitive to what happens to threads after they wake up from sleeping.
  • threads006 tests how quickly threads can be created and destroyed, as well as throwTo blocking performance. It is very sensitive to the number of major GCs that occur (which can be influenced if TSO size changes).
  • threads007 generates a lot of threads waiting on MVars, and then sees how shutdown behavior is affected. It was due to bad behavior in the MVar queue and fixed in f4692220c7.
Conclusion

The GHC scheduler is pretty complicated! Much of the current behavior was created in response to specific problems: the right choices are not obvious a priori! I hope this post will serve as a valuable reference for any future GHC hackers interested in playing around with the scheduler, as well as for anyone else who needs to implement a scheduler for their runtime system. Much of the historical data was gleaned from comments (though I found some out-of-date ones), liberal use of git blame, and cross-referencing with the bug tracker—these are all useful places to figure out, “Well, why does that code do that?” In this post, I hope I’ve answered that question, to some degree.

[1] Initialization of StgTSO is handled in createThread in rts/Threads.c; this function is in turn invoked by createGenThread, createIOThread and createStrictIOThread in rts/RtsAPI.c. These functions setup the initial stack state, which controls what the thread executes when it actually gets run. These functions are the ones invoked by the fork# and other primops (entry-points for primops are located in rts/PrimOps.cmm).

[2] Actually, your usable stack will be a little smaller than that because this size also includes the size of the StgTSO struct. (This is only really for allocating lots of threads into one block, however, as once a GC occurs the TSOs and stacks will no longer be adjacent.)

[3] Here is a sample program which demonstrates how holding onto ThreadId using stable pointers (which force the object their pointing to to never be GC'd) can leak memory:

import Control.Concurrent import Control.Monad import Foreign.StablePtr n = 400000 main = do ms <- replicateM n (newEmptyMVar >>= \m -> (forkIO (putMVar m ()) >>= newStablePtr) >> return m) mapM_ takeMVar ms

The heap profile of the run shows none of the TSO/STACK objects being deallocated, even when the MVars drain out as threads finish executing.

[4] The write barrier for generational GCs refers not to memory barrier of multithreaded execution, but rather, notification for the garbage collector when a mutable reference in the old generation changes, and may now possibly point to an object in the young generation. Write barriers are necessary because the old generation will not be traversed during a minor collection, and thus if old generations may point to an object in a young generation, we may miss the fact that a young object is still alive even though it has no references from other young objects. In GHC, a write barrier is implemented by adding an object to the mutable list (mut_list) of a Capability if it is not in the youngest generation. (Some objects, like MutArr#, are permanently on the mutable list; in such a case, a write barrier may not be necessary. But see [5] for more details.) Objects will usually track their dirty status, so that they don’t add themselves to the mutable list multiple times. (Accidentally adding an object multiple times is harmless, but means the GC has to do extra work traversing the mutable list.) Additionally, if we can guarantee that the new reference does not point to the young generation (for instance, it is a static closure like END_TSO_QUEUE), then dirtying the object is not necessary. Getting this stuff right is tricky, to say the least!

[5] There is a bit of a sordid story here. Keeping an object permanently on the mutable list is done by scavenge_mutable_list in rts/sm/Scav.c, which will unconditionally re-add such an object to the mutable list if it sees it there. How does the object get on the mutable list in the first place? It’s not placed on the list upon creation; rather, upon the first minor GC on the youngest generation, the scavenging GC notices the object and places it on the mutable list by gct->failed_to_evac = rtsTrue. How do we end up freeing the object? The mutable list is considered a set of root pointers, but it is only scavenged, not evacuated. If an item on the mutable list ends up not being evacuated, it will be blown away regardless. (This does mean, however, that its elements will not be freed until the next GC.) Isn’t it really inefficient to always be scanning these arrays? Yes, and this used to be a problem (ticket #650), nowadays mitigated by card marking. The same story applied to TSOs (ticket #1589), but the fix here was to properly apply a write barrier and not keep the objects permanently on the mutable list; this improved performance quite a bit when there were a lot of threads (even if you don’t scavenge their pointers, traversing a huge mutable list is still a pain.) Creating a lot of small mutable arrays is apt to be painful.

[6] It used to be singly linked, but fixing ticket #3838 required the ability to remove TSOs from the run queue.

[7] Since these fields are always traversed by the GC, it’s important that they do not contain NULL pointers or garbage. Instead, we set them to the static closure END_TSO_QUEUE. Because this is guaranteed not to be in the young generation, this is why you do not need to dirty the TSO after setting this field.

[8] Sometimes, a heap overflow and a context switch occur simultaneously. If the thread requested a large block, we still always push it in front (because we don’t want another thread to steal our large block); however, otherwise, the context switch takes precedence and the thread is booted to the end of the queue—the context switch is checked as late as possible. (See commit 05881ecab)

Categories: Offsite Blogs

suggestions for improving MonadWriter

libraries list - Sun, 01/27/2013 - 10:30pm
Dear maintainers, I have two suggestions for MonadWriter: (1) Remove the "Monoid w" constraint from the definition. The constraints prevent creating new instances of the class that have only an implied monoid. For example, I needed to create a simple writer which always stores the last written element. I had to wrap it into Last, which was a nuisance for users of my library. Without the constraint, my instance would be quite simpler and still satisfying all the laws. There are many other similar use cases, like counting the number of written values (and disregarding their actual content) etc. The constraint is meant to ensure that instances of that class obey the monad laws. But it's not the responsibility of a type class that its instances satisfy the laws. They could violate them even without this constraints. Instead, this constraint should be specified (and it is) in the definition of their instances. It has been discussed in haskell-cafe <http://www.haskell.org/pipermail/ haskell-cafe/2012-Decemb
Categories: Offsite Discussion

GHC.Generics and newtypes

haskell-cafe - Sun, 01/27/2013 - 10:17pm
Hi, Is it possible to generate different instances for newtypes and datatypes using GHC.Generics? Roman
Categories: Offsite Discussion

Six PhD Studentships at St Andrews (tight deadline)

General haskell list - Sun, 01/27/2013 - 8:57pm
We have six funded PhD Studentships available. If you know any good applicants with an interest in functional programming, parallelism or related areas, I'd be pleased to hear from them. The deadline is tight (4th February), but the main thing is to get an application in. http://blogs.cs.st-andrews.ac.uk/csblog/2012/12/20/600th-anniversary-phd-scholarships/ Best Wishes, Kevin -------- Kevin Hammond, Professor of Computer Science, University of St Andrews T: +44-1334 463241 F: +44-1334-463278 W: http://www.cs.st-andrews.ac.uk/~kh In accordance with University policy on electronic mail, this email reflects the opinions of the individual concerned, may contain confidential or copyright information that should not be copied or distributed without permission, may be of a private or personal nature unless explicitly indicated otherwise, and should not under any circumstances be taken as an official statement of University policy or procedure (see http://www.st-and.ac.uk). The University of St Andrew
Categories: Incoming News

CFP WWV 2013

General haskell list - Sun, 01/27/2013 - 8:28pm
***************************************************************************** * * * WWV 2013 * * Automated Specification and Verification of Web Systems * * 9th International Workshop * * Call for Papers * * * ***************************************************************************** IMPORTANT DATES Abstract Submission March 22, 2013 Full Paper Submission March 29, 2013 Acceptance Notification May 3, 2013 Camera Ready (pre-proceedings) May 20, 2013 Workshop June 6, 2013 Camera Ready (post-proceedings) June 26, 2013 SCOPE The Workshop on Automated Specification and Verification of Web Systems (WWV, http://users.dsic.upv.es/grupos/elp/wwv/) is a yearly workshop that aims at providing an interdisciplinary forum to facilitate the cross-fertilization and the advancement of hybrid met
Categories: Incoming News

Extended periods of "waking up thread %d on cap %d"

glasgow-user - Sat, 01/26/2013 - 2:28am
Recently I've been benchmarking my concurrent Gibbs sampler[1] on a largish multicore machine. I've been using GHC HEAD due to various GC-related fixes that are present. One thing that I've noticed in looking over the event logs is that there are large durations (tens of milliseconds) when HECs appear to be constantly bombarded with wake-ups from other capabilities. In these periods, the eventlog will be filled with blocks of messages such as this snippet from one benchmark run[2] (long repeated blocks marked with ellipses, a few irrelevant messages have been omitted yet not marked, see eventlog for full log), 28.320958s HEC 7: running thread 293 ... 28.391802s HEC 6: running thread 209 28.392070s HEC 6: waking up thread 284 on cap 7 28.392302s HEC 6: waking up thread 284 on cap 7 28.392579s HEC 6: waking up thread 284 on cap 7 28.392808s HEC 6: waking up thread 284 on cap 7 . . . 28.405971s HEC 16: waking up thread 284 on cap 7
Categories: Offsite Discussion

data kinds

glasgow-user - Fri, 01/25/2013 - 5:19pm
GHC implements data kinds by promoting data declarations of a certain restricted form, but I wonder if it would be better to have a special syntax for kind definitions, say data kind Nat = Zero | Succ Nat At the moment, things get promoted whether you need them or not, and if you've made some mistake that makes your definition non-promotable you don't find out until you try to use it. More importantly, a separate form for kinds would allow one to use existing kinds, i.e. *, in definitions of new kinds.
Categories: Offsite Discussion

Fastest way to reload module with GHC API

glasgow-user - Fri, 01/25/2013 - 4:30pm
Hello, I just want to be sure of what's the fastest way to reload a module with the GHC API. I have a file whose path is fp I load the module with: addTarget Target { targetId = TargetFile fp Nothing, targetAllowObjCode = True, targetContents = Nothing } Then I load the module load LoadAllTargets And when I want to reload the module (the contents of fp have changed) I do: removeTarget (TargetFile fp Nothing) load LoadAllTargets and then I rerun my initial code (addTarget, load) This works well, I don't get any errors about duplicates, the information I extract from the module is updated properly. But is it the most efficient, or can I achieve the same thing faster (I'm looking at improving the performance of my BuildWrapper code and by consequence of EclipseFP)? Thanks!
Categories: Offsite Discussion

Size of crosscompiled exectuable

glasgow-user - Fri, 01/25/2013 - 3:58pm
Hey, A simple hello world application has 1Mb in by 64 bit ubunut machine. When I stript it, is has about 750kb. When I build a cross compiler for android (arm), the executable has a asize of about 10MB, stripped about 5MB. That is huge, five times the size on my linux sysem. Why is this? Can I do something about it? Thanks! Nathan
Categories: Offsite Discussion

GHCJS, JavaScript,cross-platform typechecking with the GHC API and primops

glasgow-user - Fri, 01/25/2013 - 2:37pm
hi all, I've been working on GHCJS [1] for a while and am trying to get it to work on 64 bit GHC. A quick background: GHCJS uses the GHC API to generate STG, which it then translates to JavaScript. It uses slightly patched versions of the integer-gmp, base and ghc-prim libraries and supports many GHC features, including arrays and threading. All JavaScript numbers are essentially double precision floating point. With some tricks (using bitwise operators) it's possible to get reasonable 32 bit integer emulation out of it, but emulating 64 bit integers would be very slow. Therefore, if we compile Haskell to JavaScript, it should really be treated as a 32 bit target. Now this is a problem, with GHC 7.6 and lower, generated STG depended on the word size of the host, so it was simply impossible to do this with a 64 bit host. In HEAD, wORD_SIZE_IN_BITS is now in DynFlags, so I decided to have another go. I'm aiming for the following procedure to install GHCJS and build a package: # cabal install ghcjs -- ins
Categories: Offsite Discussion

ghc passing -undef to preprocessor and thereby eliminating OS specificdefines

glasgow-user - Thu, 01/24/2013 - 1:21pm
Hey, I am trying to adapt some code in the libraries when compiling for android (i.E. because some things are different on android to other posix systems). So in C code I would just do "#ifdef __ANDROID__". While in the *.h and *.c files it seems to work, it does not work in *.hs files. I noted that the preprocessor is run like this: arm-linux-androideabi-gcc -E -undef -traditional -fno-stack-protector -DTABLES_NEXT_TO_CODE The "-undef" parameter is causing the __ANDROID__ define to be removed. Does it make sense to pass "-undef" to the preprocessor? Any other Ideas how I could adapt the code for android? Thanks! Nathan
Categories: Offsite Discussion

Validate of GHC HEAD freezes on FreeBSD

glasgow-user - Thu, 01/24/2013 - 10:33am
Hello, The "validate" script against GHC HEAD freezes on FreeBSD 9.1. After sync-all, I did as follow: ---------------------------------------------------------------- % config_args="--with-iconv-includes=/usr/local/include --with-iconv-libraries=/usr/local/lib --with-gmp-includes=/usr/local/include --with-gmp-libraries=/usr/local/lib --with-gcc=/usr/local/bin/gcc47" CPUS=10 sh validate ---------------------------------------------------------------- This stopped quickly due to this problem: http://hackage.haskell.org/trac/ghc/ticket/7592 Then I executed "validate" with "--no-clean" again. ---------------------------------------------------------------- % config_args="--with-iconv-includes=/usr/local/include --with-iconv-libraries=/usr/local/lib --with-gmp-includes=/usr/local/include --with-gmp-libraries=/usr/local/lib --with-gcc=/usr/local/bin/gcc47" CPUS=10 sh validate --no-clean ---------------------------------------------------------------- GHC could be compiled and tests started. But this result
Categories: Offsite Discussion

Global constant propagation

glasgow-user - Sun, 01/20/2013 - 11:18pm
I'm curious about global constant propagation in GHC.  It's a fairly basic optimization in the CFG-based compiler domain, and it's similar to constructor specialization, but it doesn't seem to be in GHC's repertoire.  Perhaps it's usually subsumed by other optimizations or it's more complicated than I am thinking.  Is this optimization worth implementing? This optimization can help when a case expression returns a product, some fields of which are the same in all branches.  The following program is a minimal example of an optimizable situation that GHC doesn't exploit. {-# OPTIONS_GHC -O3 -funbox-strict-fields #-} data D = D !Int !Int foo n = if n > 0         then D 0 0         else D 0 n main =   case foo $ read "7"   of D x y -> if x == 0 then return () else print y >> putStrLn "A" After inlining and case-of-case transformation, GHC produces main = let n = read "7"            k x y = case x of {0 -> return (); _ -> print y >> putStrLn "A"}        in if n > 0
Categories: Offsite Discussion

New gtk2hs 0.12.4 release

gtk2hs - Wed, 11/21/2012 - 12:56pm

Thanks to John Lato and Duncan Coutts for the latest bugfix release! The latest packages should be buildable on GHC 7.6, and the cairo package should behave a bit nicer in ghci on Windows. Thanks to all!

~d

Categories: Incoming News