I’ve been spending last months on developing GHC. No rocket science so far, just a bit of hacking here and there. The biggest thing I am working on is ticket #6135, which is about changing some of the existing PrimOps to return unboxed Int# instead of Bool. This means that the result of comparing two unboxed values will be either an unboxed 0# or unboxed 1#, instead of a tagged pointer to statically allocated object representing True or False. This modification will allow to write branchless algorithms in Haskell. I promise to write about this one day, but today I want to blog about a different topic.
It so happens that things I’ve been doing in GHC require me to make changes in the code generator. This is a bit challenging for me, because the code generator is something that didn’t interest me much when I started to learn about compilers. Probably the main reason for this is that code generation means dealing with assembly. I’ve been programming for about 16 years and only two languages caused me problems when I tried to learn them. Assembly is one of them1. I have been learning it for one year during my studies and, although I had no problems with understanding the idea behind assembly and writing short snippets of code, writing a larger piece of code always ended up in a headache.
It looks that the time has come to overcome my fear. During last months I’ve been reading a lot of assembly generated by GHC and I even made some attempts at writing assembly code by myself (well, using intrinsics, but I guess that counts). But between Haskell source code and the generated executable there are many intermediate steps. From my observations it seems that many Haskellers have basic knowledge of Core – GHC’s intermediate language. Most have also heard about other two intermediate representations used by GHC – STG and Cmm – but it seems that few people know them, unless they hack the compiler. And since I’m hacking the compiler I should probably have more knowledge about these two representations, right?
There’s a classic paper by Simon Peyton-Jones “Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine”. It is quite long – 87 pages total – and, being published in 1992, it is mostly out of date. These two things kept me from reading it, although I think that being out of date was only a pretext for me to avoid reading almost 90 pages of text. But, since I need to learn about STG, I finally decided to give it a shot. Reading the paper took my four days. Paper is very well written and in general is an easy read. I was afraid that I might not understand formal description of operational semantics of STG, but it turned out to be well explained so I had no problem with that. The major problem turned out to be the amount of knowledge I had to learn while reading. This resulted in problems with fully understanding last sections of the paper. Not because they are more difficult than the initial ones, but because I didn’t fully remember all the details that were discussed earlier. An important question is which information is not up to date. I’m not yet familiar with the existing implementation, but it seems that many things have changed: the Spineless Tagless G-machine is not tagless any more since the introduction of pointer tagging; curried function are now evaluated using eval/apply convention, while the paper describes push/enter; the paper discusses only compilation to C, while currently C back-end is becoming deprecated in favour of native code generator and LLVM; and finally the layout of closures is now slightly different than the one presented in the paper. I am almost certain that garbage collection is also performed differently. These are the differences that I noticed, which means that really a lot has changed since the publication over 20 years ago. Surprisingly, this doesn’t seem like a big problem, because the most important thing is that the paper presents an idea of how STG works, while the mentioned changes are only not so important details.
So, now that I have a basic idea of how STG works, what comes next? There are a few follow up papers:
- “The STG runtime system (revised)” – an updated description of STG written in 1999 by Simon Peyton Jones and Simon Marlow. I guess it’s also outdated, but still probably worth reading. It has only 65 pages :)
- “Making a Fast Curry. Push-Enter vs. Eval-Apply for Higher-order Languages” – this described the mentioned eval/apply and push/enter strategies. Already read this one.
- “Faster Laziness Using Dynamic Pointer Tagging” – this will tell you why STG is not tagless. Read this one also.
And once I’ll deal with STG I’ll have to learn about Cmm.
- In case you’re interested, the other one is Io
In this post I would like to talk about my experience with
bootstrapping GHCJS using the provided facilities ghcjs-build. I
never used tools like Vagrant or Puppet before so all of this was
kinda new to me.
GHCJS can’t actually work with vanilla GHC 7.* as it requires to
apply some patches (in order to get JS ffi to work, it adds
ghcjs-build uses Vagrant (a tool for automatically building and
running work environments) to mange the work environment, so prior to
running GHCJS you need to install vagrant and VirtualBox. It’s actually
a sensible way to tackle a project like that: everyone has similar
work environments, you don’t have to mess with your local GHC
installation. It also make use of Puppet deployment system in
puppetlabs-vcsrepo module for cloning Git repositories.
Currently, there are two ways to start up GHCJS using ghcjs-build2.1 Using the prebuilt version git clone https://github.com/ghcjs/ghcjs-build.git cd ghcjs-build git checkout prebuilt vagrant up
Using this configuration the following procedures are performed:
- Vagrant sets up a 32-bit Ubuntu Precise system (/Note: if this is
your first time running Vagrant it downloads the 280Mb
precise32.box file from the Vagrant site/)
- Vagrants does some provisioning using Puppet (downloads and
installs necessary packages)
- A 1.4GB archive with ghcjs and other prebuilt tools are downloaded
Apart from setting up the box this will
- Get the GHC sources from Git HEAD and applies the GHCJS patch.
- Get all the necessary packages for ghcjs
- Get the latest Cabal from Git HEAD, applies the GHCJS patch and
- Compile the necessary libraries using ghcjs
- Compile ghcjs-examples and its dependencies (it appears that it
can take a lot of time to compile gtk2hs and gtk2hs’s tools)
Please note, that depending on your computer, you might want to go for
a long walk, enjoy a small book or get a night sleep (assuming you are
not scared by the sound of computer fans).
Apart from being slow, the process of compiling everything from
source is error prone. To give you a taste, last night I was not able
to reproduce a working environment myself, because of some recent
changes in GHC HEAD. The prebuilt version on the other hand is
guaranteed to install correctly.
Hopefully, the GHCJS patches will be merged upstream before the GHC
7.8 is out. That way you won’t need to partake in building GHC from
the source in order to use GHCJS.
After you’ve finished with the initial setup you should be able just
in your new vm and start messing around.
ghcjs command is available to you and Vagrant kindly forwards the
3000 port on the VM to the local 3030 port, allowing you to run web
servers like warp on the VM and accessing them locally.
You can access your local project directory under /vagrant in VM:$ ls /vagrant keys manifests modules outputs README.rst Vagrantfile
However, copying file back-and-forth is not a perfect solution. I
recommend setting up a sshfs filesystem (Note: if you are on OSX,
don’t forget to install fuse4x kernel extension):
When you are done you can just umount ../vm3 Compiling other packages
Since the diagrams package on Hackage depends on the older version
of base we are going to use the latest version from Git:
Other packages I had to install already had their Hackage versions
Now you can try to build a test diagram to see that everything worksmodule Main where import Diagrams.Prelude import Diagrams.Backend.SVG.CmdLine d :: Diagram SVG R2 d = square 20 # lw 0.5 # fc black # lc green # dashing [0.2,0.2] 0 main = defaultMain (pad 1.1 d)
then you can compile and run itghc --make Test.hs ./Test -w 400 -o /vagrant/test.svg
And that’s it!4 Outro
I would also like to note that we are currently polishing the GHCJS
build process. Luite, especially is working on making ghcjs work (and
run tests) with Travis CI (it take quite a bit of time to build ghcjs
and sometimes travis is timeouting) and I am working on tidying up
the build config.
Stay tuned for more updates.
Tagged: diagrams, ghcjs, haskell, soc
It seems like most of the time when using the Cont monad, one would eventually end up with the "final result" in the intermediate result, leading to this pattern:runContId k = runCont k id
Have others observed this as well? Might it be useful for something like this to live in Control.Monad.Trans.Cont?submitted by singpolyma
[link] [9 comments]