Lately I've been familiarizing myself with low-level performance hackery. I also perused the Data.HashMap sources, and in particular it started to bug me that doing snoc/append/insert on primitive arrays is not very efficient in the current state of affairs.
The core of the problem is that we can't have uninitialized arrays lying around, because the GC would try to follow the junk data, believing them to be valid pointers. So we can allocate unboxed ByteArrays uninitialized, but not boxed Arrays, and thus we have to initialize fresh arrays only to overwrite them a moment later.
I did a little benchmark on snoccing. It emulates a primitive snoc in a horribly unsafe way: it uses thawArray to copy one more element than there actually is in the source array, and then writes the new element to the back. Of course, this is a bounds error and possibly a GC mess-up, but it works at least for demonstration purposes.
Here it is. The speedup seems to be significant (1,5x - 2,5x). Primitive insertion could be similarly faster (which, unlike snoc, I couldn't test with a hack). HashMap does lots and lots of array insertions so it might benefit considerably from this.
Is this a kind of thing that would be sensible to implement? I'm posting here instead of to GHC trac because at the moment I am entirely unfamiliar with the GHC codebase or development process.submitted by Puttamalac
[link] [14 comments]
More hakyll blog fixes:
Ugly things showing on planets
My posts were showing unwanted things on planet haskell - double heading, redundant date, tag links, and ugly disqus html. By comparing with Jasper Van der Jeugt’s blog, I found the problem: I was snapshotting content for the feed at the wrong time, after applying the post template:>>= return . fmap demoteHeaders >>= loadAndApplyTemplate "templates/post.html" (postCtx tags) >>= saveSnapshot "content" >>= loadAndApplyTemplate "templates/default.html" defaultContext
Better:>>= saveSnapshot "content" -- >>= return . fmap demoteHeaders >>= loadAndApplyTemplate "templates/post.html" (postCtx tags) >>= loadAndApplyTemplate "templates/default.html" defaultContext
Manual feed publishing
The main blog feed is now generated with a _ prefix, and I must manually rename it (with make feed) to make it live it on Planet Haskell. This will hopefully reduce snafus (and not create new ones)../site.hs 95 - create ["blog.xml"] $ do + create ["_blog.xml"] $ do ./Makefile 14 +feed: _site/blog.xml + +_site/blog.xml: _site/_blog.xml + cp _site/_blog.xml _site/blog.xml +
Better HTML titles
Changed the “Joyful Systems” prefix to a suffix in the HTML page titles, making search results and browser tab names more useful.
Today, a coalition of thousands of Internet users, companies and organizations launched a campaign for a day of action to “Reset The Net” on June 5th, 2014, the anniversary of the first NSA surveillance story revealed by whistleblower Edward Snowden. Tens of thousands of internet activists, companies and organizations committed to preserving free speech and basic rights on the Internet by taking steps to shutting off the government’s mass surveillance capabilities.
Watch the campaign video and see a full list of participants here: http://ResetTheNet.org More than 20 organizations and companies support the launch of the campaign including Fight For The Future (who initiated the campaign) along with reddit, CREDO Mobile, Imgur, Greenpeace, Libertarian Party, FireDogLake, Thunderclap, DuckDuckGo, Disconnect.Me, Demand Progress, Access, Free Press, Restore the Fourth, AIDS Policy Project, PolitiHacks, OpenMedia, Free Software Foundation, Bill of Rights Defense Committee, Code Pink, Popular Resistance, Participatory Politics Foundation, BoingBoing, Public Knowledge, Amicus, New America Foundation’s Open Technology Institute, Progressive Change Campaign Committee, Student Net Alliance, and the Center for Democracy and Technology. ... Internet users are invited to join in on the day of Reset The Net to install privacy and encryption tools and secure their personal digital footprint against intrusive surveillance.Technical information here and press release here.
I am programming an online code editor which one of the main features is a huge, integrated library of functions. The main purpose of this editor is to create simple practical applications (such as games) using FP. The major problem, at this point, is choosing which language to use.
Obviously the language must be, at least, referentially transparent - it would make no sense to have a catalog of unpure functions. This eliminates almost all practical languages other than Haskell. Unfortunately, Haskell is kinda heavyweight, in the sense it is nowhere near trivial running Haskell on the browser, creating 3D games for Android and so on. But, more than that, this will have a focus on games so I'd like people to do things like 3D graphics on that language without having to call external libraries directly (that is, no OpenGL programming). In other words, I want the compiler to be smart enough to figure out when you are mapping a function to a 1024x1024 list and do it using GPU buffers instead. So, even if I had a good enough Haskell->JS compiler, this would still make Haskell unsuitable. Yes, there are neat ways to do GPU in HS (LambdaCube/REPA/etc), but this involves a lot of particular knowledge on those libraries. What I need is that things like (map black_and_white colors) :: [Color] are compiled to run in a buffer inside a WebGL, which is obviously far from what I can expect from any Haskell compiler, at least by now, and probably never, considering this would be a semantically wrong translation.
So I'm left with creating my own language. The fact the language is very specific and isn't supposed to do any complicate things Haskell has to deal with makes this easier, so I've studied a lot, implemented many compilers in the last months and now I have an approach that, I believe, looks like it will be good enough for my case. It is a modification of the STLC with primitive recursion. It is, if I understand what I am doing, very suitable to supercompilation and my compiler is already good enough to do some nice things such as fusing two recursive functions \x -> map double (map triple x) into one without stream fusion.
Yet, I want to hear some of you opinion before I spend too much time on this path. After all, this is too time-consuming and error prone to do all this by myself, considering I'm new to language design. I obviously would very much prefer to have a ready to use solution and use my time mostly on the design of the encyclopedia itself, which is the product, after all.
So, Haskellers, do you think I'm on the right way, or would you this any different? What advice do you have for me?submitted by SrPeixinho
[link] [10 comments]
I’ve recently (and much belatedly) started to get involved in the functional programming community in Munich. This encompasses:
- The Munich Haskell Meeting, where a group of us get together once a month to socialize and chat about more or less FP related topics.
- The Haskell Hackathons, a smaller get-together to learn and explore more advanced Haskell. What little I know about free monads, I owe to these guys.
- Munich Lambda, a more diverse FP-oriented bunch, where I went to hear Andres Löh’s very nice talk on parallel programming.
Monday before Easter, I went to the MHM, which included an interesting talk by Rene Brunner about Big Data. Which made me think a bit. There are many definitions of “big data” (and Rene provided several of them), Wikipedia suggests:
Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
I will define “big data” to be the point where a quantitative increase in problem size forces a qualitative change in how we approach them. In other words, problem sizes grow continously, but at some point, we can no longer keep up, and will have to rethink our methods and tools. Think of it as a paradigm shift in the way we do analysis.A brief history of…me
I started out working in bioinformatics some years ago, at the Data Centre of a research institute. This department was using traditional data centre technology, meaning SQL data bases, sometimes hooked up to automatic data collection equipment, sometimes with web-based wrapping written in PHP or Java, often both.
When I arrived, bioinformatics was something new, and there was a strong inclination to shoehorn these new data into the same kind of infrastructure. My sentiment was that this was, perhaps, not the best way to go about it. In contrast to most other data, the bioinformatics data had:
- larger data sizes.
- heterogenous and non-tabular data
- many specialized tools
In other words, bioinformatics represented a “big data” paradigm shift for the data centre. SQL databases were no longer very useful the primary storage (but sometimes used as an application optimization).The next paradigm shift
Now, we’ve all seen the graphs showing how the capacity of sequencing technology is growing exponentially, and at a much faster rate than Moore’s law. In other words, the computational power to data size ratio is diminishing, and even for algorithms that scale linearly with input size (and that’s pretty optimistic for an algorithm), compute power is becoming a scarce resource. Again, we can’t keep up.
And although people have been knowing and showing this for years, somehow there’s always money for equipping labs with more sequencers, and grants for new sequencing projects. All too often, the analysis is just added as an afterthought.
And if you allow me to digress, this makes me wonder: what if the tables were turned, if it were the other way around? What if I got to tell my biologist colleagues that, hey, I just got money to buy yet another new supercomputer with the capacity to analyze the expression data for every human gene - now I need you to go to the lab and do twenty five thousand qPCR runs for each of these twenty samples. Could you have it done by Monday? And by the way, we’re hiring some more computer scientists.Scaling
I’ve previously argued that our methods in bioinformatics is not quite where we would want them to be, and in many cases, they tend to produce outright incorrect results. But in addition, I think they no longer scale.
Take the de novo genome project, for instance. For something bigger (genome-wise, of course) than a bacterium, assembly of a de novo genome is a major undertaking. Sequencing is only a small fraction of the cost, and you really need a team of experienced scientists and engineers to produce something useful and reliable. And lots and lots of computation.
And although producing curated “resources” (like genome assemblies) can be useful and get you published, often there are more specific goals we want to achieve. Perhaps we are only interested in some specific aspect of the species, maybe we want to design a vaccine, or study the population structure, or find a genetic sex marker.Lazy vs strict analysis
Depending on a curated resources for analysis is an example of an eager approach. This is fine if you work on the handful of species where you have a large and expensive consortium that already created the curated resource for you. And of course, someone nearby who can afford a supercomputing facility.
But for the remaining species, this isn’t going to work. Even if somebody decided to start prioritizing the computational and analytical aspects of biology, it would still consume a lot of valuable resources. Instead of just pouring money over compute infrastructure, we need to be more efficient. That means new algorithms and, I think, new approaches to how we answer scientific questions.
I don’t have all the answers to this, but rethinking how we do analysis could be a start. As programmers often forget, the best optimization to avoid doing the unnecessary bits. A starting point would be to exchange the eager approach for a lazy (or at least non-strict one. Instead of starting with the data and asking what we can do, we need to start with the question, and then ask how we can answer it from the data.Why it’s not going to happen
Writing this, I realize I have said it all before. It is easy to blame the non-progress on backwards grant agencies, biologist-trained management, stubborn old professors trapped in paradigms of the past, or any other forms of force majeure. But there is a fundamental problem with this line of thought.
Although compute resources are quickly becoming scarce, another resource – the bioinformaticians – is even scarcer. And instead of churning through a slow, expensive, and error prone stack of tools to generate a sequence of intermediate products (genome assembly, gene prediction, functional annotation, etc), and then, finally, looking up the desired result in a table, I believe a lazy approch will need to tailor the methods to the question. In other words, scientists will have to write programs.
Summary: Ninja is a build system focused on being fast. Some limited benchmarking suggests the Shake build system might be faster.
The Shake build system aims to be both powerful and fast. The Ninja build system describes itself as small with a focus on speed. Now that Shake can run Ninja build files, I benchmarked Shake against Ninja.
I have been benchmarking Shake vs Ninja as part of the Shake continuous integration tests. My benchmark builds the Ninja source code using their Ninja build file, once from scratch with 3 threads (a full build), then builds again to ensure it is up to date (a zero build). The test is run with both Ninja and Shake, always immediately after a Ninja build (to ensure all files are cached). The average times over 71 runs are:
- Full build: Ninja takes 6.552s, Shake takes 5.985s. Shake is 0.567s faster.
- Zero build: Ninja takes 0.007s, Shake takes 0.012s. Ninja is 0.005s faster.
These tests are run on Linux, on a Travis machine. Both the hardware and loading of the machine is likely to vary over time. I deliberately picked a lower level of parallelism to try and ensure the build was not limited by running too many processes (it does not seem to be). It is now a test failure if Shake is slower for the full build, or if Shake is more than 0.1s slower for the zero build.
A more interesting test would be building something more substantial than Ninja - but choosing a benchmark is hard, and I am limited by the amount of Travis time I can use. It is not clear if Shake will be consistently N seconds faster than Ninja, or N% faster than Ninja, or if this result is an aberration due to the particular choice of benchmark. Shake does not implement the Ninja feature of rebuilding when the command line changes - adding that feature would be unlikely to have any impact on the full build but may slightly slow down the Shake zero build.
Improvements to Shake
When I first started benchmarking Shake vs Ninja, I had reports that Shake was significantly slower - taking around 40% longer to build large projects. As a result I made a number of improvements to Shake:
Improvement 1: --skip-commands
I added the --skip-commands flag and shakeRunCommands option to Shake, which skips running any command operations which have no return results. Provided your build system does not delete temporary files, it allows you to build normally, then build with --always-make --skip-commands to "run the build", skipping running commands, measuring the rest of the build system.
Improvement 2: Makefile parsing
Using --always-make --skip-commands on LLVM via Ninja files, I found the non-command build time was 58s. Profiling showed that most of the time was spent parsing Makefiles, so I wrote optimised routines, available from Development.Shake.Util. These changes reduced the LLVM non-command time to 15s.
Improvement 3: Filepath normalisation
Further profiling showed that filepath normalisation was now a bottleneck. I responded by both optimising the filepath normalisation (writing a large test suite and correcting several bugs in the process), and removing some redundant normalisations. These changes reduced the LLVM time to 4s, most of which went on file modification time checking.
Improvement 4: Turning off idle garbage collection
By default, programs compiled with GHC run the garbage collector if the Haskell runtime is idle for 0.3s. For Shake, which regularly becomes idle when running commands, the garbage collector ends up competing with the commands it has spawned. I now recommend people turn off the idle garbage collector by compiling with -with-rtsopts=-I0, and I do so for the shake executable.
Improvement 5: --timings
In order to accurately measure where time was going, I added the --timings flag and shakeTimings option. When run with --timings Shake prints out something like:Start 0.006s 1%
shakeArgsWith 0.000s 0%
Function shake 0.002s 0%
Database read 0.049s 10% ===
With database 0.002s 0%
Running rules 0.353s 72% =========================
Pool finished (5 threads, 2 max) 0.000s 0%
Lint checking 0.033s 6% ==
Profile report 0.041s 8% ==
Total 0.486s 100%
Build completed in 0:01m
Here we can see which stages are taking most time. For example, reading in the database takes 0.049s at 10% of the time. The = symbols to the right serve as a crude bar plot representing the timings.
Improvement 6: Smaller database
For zero builds I found much of the time was spent reading the database. I changed some of the representation, using smaller Int types and more compact encodings. These changes reduced the database by ~25% and had a small effect on the time to read the database.
For the full build, I beat Ninja, despite originally only aiming for a draw. The build overhead introduced by Shake is 0.029s, of which 0.010s is running the rules. Provided that scales linearly, the cost seems negligible compared to actually performing the build.
For the zero build, I am slower than Ninja. To investigate I measured just running --version with Ninja and Shake. Ninja takes 0.003s and Shake takes 0.004s, so a large portion of the zero build times is the cost of starting the executable, not project specific. Running --timing I see:Start 0.000s 3% ==
shakeArgsWith 0.000s 7% =====
Ninja parse 0.001s 16% ===========
Function shake 0.000s 10% ======
Database read 0.002s 36% =========================
With database 0.000s 3% ==
Running rules 0.001s 20% =============
Pool finished (2 threads, 2 max) 0.000s 2% =
Total 0.004s 100%
The largest bottleneck is database reading. Duncan Coutts' recent work on moving the binary library to CBOR is likely to result in a smaller and faster database. I await that work eagerly, and will look at further optimisations after it is available.
Is Shake faster than Ninja?
For building Ninja from scratch, Shake is faster than Ninja (perhaps the Ninja developers should switch to Shake for their development work :P). Another Ninja user benchmarked building the VXL project with both Shake and Ninja and discovered Ninja took 44 mins, while Shake took 41 mins (Shake 3 minutes faster) - but this benchmark was only run a handful of times. I would be interested to hear additional results.
ok so i have to "translate" arithmetic expressions to readable Stringsdata Expression = Constant Integer | Variable String | Sum Expression Expression | Difference Expression Expression | Product Expression Expression deriving (Show) expressionToString:: Expression -> String
can you help me build this parser ? (later i have to evaluate the expression too)submitted by 13utters
[link] [5 comments]
What's a good article or talk for getting someone curious about Haskell or functional programming in general?
Suppose you just want to inform a curious programmer, who is not familiar with Haskell or functional programming, about some cool or powerful things Haskell enables. You're not trying to be preachy; you're not sending them a step-by-step tutorial for learning Haskell, nor are you even just sending them a "why Haskell is great" article. Rather, you just want to expose them to some new ideas which they can research further if they're interested.
Is there a good article/video/whatever which accomplishes this? I'd like something I can send around to fellow programmers which does not presume that learning Haskell is necessarily in the listener's best interest. Unfortunately it's pretty difficult to find search terms for such a thing.
Thanks!submitted by dont_memoize_me_bro
[link] [15 comments]
Last week I gave a talk at Skills Matter about my work on a GHC extension to support reusing record field names, about which I have written previously. The video has been available since last week, thanks to the Skills Matter team, and the reddit discussion is already well underway. Now I’m (finally) making the slides available.