News aggregator

Sky Blue Trades

del.icio.us/haskell - Mon, 01/27/2014 - 5:32am
Categories: Offsite Blogs

Yesod Web Framework: Initializing data in the foundation datatype

Planet Haskell - Mon, 01/27/2014 - 2:40am

I've added a very short chapter to the Yesod book giving an example of performing initialization before your application starts. The technique isn't complicated (which explains why the chapter is so short), but new users often get stuck trying to find the canonical way to do this. Hopefully this chapter will clarify the issue. If there's anything unclear in the example, please let me know so I can update it.

Below is the current version of the content, though I recommend reading from the link above to ensure you get the newest version of the content.

This example is meant to demonstrate a relatively simple concept: performing some initialization of data to be kept in the foundation datatype. There are various reasons to do this, though the two most important are:

  • Efficiency: by initializing data once, at process startup, you can avoid having to recompute the same value in each request.

  • Persistence: we want to store some information in a mutable location which will be persisted between individual requests. Often times, this is done via an external database, but it can also be done via an in-memory mutable variable.

<aside class="note">

While mutable variables can be a convenient storage mechanism, remember that they have some downsides. If your process dies, you lose your data. Also, if you scale horizontally to more than one process, you’ll need some way to synchronize the data between processes. We’ll punt on both of those issues for this example, but the problems are real. This is one of the reasons Yesod puts such a strong emphasis on using an external database for persistence.

</aside>

To demonstrate, we’ll implement a very simple website. It will contain a single route, and will serve content stored in a Markdown file. In addition to serving that content, we’ll also display an old-school visitor counter indicating how many visitors have been to the site.

<section id="initializing-foundation-data_step_1_define_your_foundation"> Step 1: define your foundation

We’ve identified two pieces of information to be initialized: the Markdown content to be display on the homepage, and a mutable variable holding the visitor count. Remember that our goal is to perform as much of the work in the initialization phase as possible and thereby avoid performing the same work in the handlers themselves. Therefore, we want to preprocess the Markdown content into HTML. As for the visitor count, a simple IORef should be sufficient. So our foundation data type is:

data App = App { homepageContent :: Html , visitorCount :: IORef Int } </section> <section id="initializing-foundation-data_step_2_use_the_foundation"> Step 2: use the foundation

For this trivial example, we only have one route: the homepage. All we need to do is:

  1. Increment the visitor count.

  2. Get the new visitor count.

  3. Display the Markdown content together with the visitor count.

One trick we’ll use to make the code a bit shorter is to utilize record wildcard syntax: App {..}. This is convenient when we want to deal with a number of different fields in a datatype.

getHomeR :: Handler Html getHomeR = do App {..} <- getYesod currentCount <- liftIO $ atomicModifyIORef visitorCount $ \i -> (i + 1, i + 1) defaultLayout $ do setTitle "Homepage" [whamlet| <article>#{homepageContent} <p>You are visitor number: #{currentCount}. |] </section> <section id="initializing-foundation-data_step_3_create_the_foundation_value"> Step 3: create the foundation value

When we initialize our application, we’ll now need to provide values for the two fields we described above. This is normal IO code, and can perform any arbitrary actions needed. In our case, we need to:

  1. Read the Markdown from the file.

  2. Convert that Markdown to HTML.

  3. Create the visitor counter variable.

The code ends up being just as simple as those steps imply:

main :: IO () main = do rawMarkdown <- TLIO.readFile "homepage.md" countRef <- newIORef 0 warp 3000 App { homepageContent = markdown def rawMarkdown , visitorCount = countRef } </section> <section id="initializing-foundation-data_conclusion"> Conclusion

There’s no rocket science involved in this example, just very straightforward programming. The purpose of this chapter is to demonstrate the commonly used best practice for achieving these often needed objectives. In your own applications, the initialization steps will likely be much more complicated: setting up database connection pools, starting background jobs to batch process large data, or anything else. After reading this chapter, you should now have a good idea of where to place your application-specific initialization code.

Below is the full source code for the example described above:

{-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE QuasiQuotes #-} {-# LANGUAGE RecordWildCards #-} {-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE TypeFamilies #-} import Data.IORef import qualified Data.Text.Lazy.IO as TLIO import Text.Markdown import Yesod data App = App { homepageContent :: Html , visitorCount :: IORef Int } mkYesod "App" [parseRoutes| / HomeR GET |] instance Yesod App getHomeR :: Handler Html getHomeR = do App {..} <- getYesod currentCount <- liftIO $ atomicModifyIORef visitorCount $ \i -> (i + 1, i + 1) defaultLayout $ do setTitle "Homepage" [whamlet| <article>#{homepageContent} <p>You are visitor number: #{currentCount}. |] main :: IO () main = do rawMarkdown <- TLIO.readFile "homepage.md" countRef <- newIORef 0 warp 3000 App { homepageContent = markdown def rawMarkdown , visitorCount = countRef }
Categories: Offsite Blogs

tasty

del.icio.us/haskell - Sun, 01/26/2014 - 4:04pm
Categories: Offsite Blogs

tasty

del.icio.us/haskell - Sun, 01/26/2014 - 4:04pm
Categories: Offsite Blogs

Banned from the #haskell IRC channel

Haskell on Reddit - Sun, 01/26/2014 - 2:37pm

I was banned one day from the #haskell IRC channel because I started joking around when there was no one talking (a mistake on my part, since it was off-topic).

How could I get this ban reviewed?

submitted by lovewithacaveat
[link] [comment]
Categories: Incoming News

GHC iOS ARMv7/ARMv7s fat support completed

glasgow-user - Sun, 01/26/2014 - 12:23pm
Hi folks, Happy to report that I've finished an approach to armv7/armv7s fat compilation, just in time for 7.8's imminent release. You'll find the necessary scripts here: https://github.com/ghc-ios/ghc-ios-scripts and the latest instructions for building GHC for iOS usage here: https://ghc.haskell.org/trac/ghc/wiki/Building/CrossCompiling/iOS I've also added support for a perf-cross BuildFlavour, which will give a higher-performance and profiling-ready build that matches what we'll be putting together as the official 7.8 GHC iOS binaries: https://ghc.haskell.org/trac/ghc/ticket/8700 Cheers Luke _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users< at >haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Categories: Offsite Discussion

Jan Stolarek: Code testing in Haskell revisited (with Tasty)

Planet Haskell - Sun, 01/26/2014 - 4:17am

About 1,5 year ago I wrote a post about code testing in Haskell. Post was accompanied by haskell-testing-stub: a small project showing how to organize tests and benchmarks in Haskell. I used test-framework package to gather tests written in different testing frameworks (QuickCheck and HUnit in my case) in a coherent suite. Test-framework might have been a good choice in 2012 but that might not be the case today. The major issue is that it has been abandoned by and was unmaintained for several months. Recently the project has been taken up by the community and received some updates but for now it looks like there won’t be any major development.

As a response to the test-framework package being unmaintained Roman Cheplyaka has released tasty (original announcement here). Since its release in August 2013 tasty has received packages supporting integration with QuickCheck, HUnit, SmallCheck, hspec as well as support for golden testing and few others. I decided to give tasty a try and use it in my haskell-testing-stub project. Tasty turned out to be almost a drop-in replacement for test-framework. I had to update cabal file (quite obviously), change imports to point to tasty rather than test-framework and replace usage of [Test] type with TestTree. The only problem I encountered was adapting tests from HUnit. It turns out that tasty-hunit package does not have a function that allows to use an existing suite of HUnit tests. That feature was present in test-framework-hunit as hUnitTestToTests function. I mailed Roman about this and his reply was that this was intentional as he does not “believe it adds anything useful to the API (i.e. the way to *write* code).” That’s not a big issue though as it was easy to adapt the missing function (although I think I’ll just put it in a separate package and release it so others don’t have to reinvent the wheel).

I admit that at this point I am not sure whether switching from test-framework to tasty is a good move. The fact that tasty is actively developed is a huge plus although test-framework has reached a mature state so perhaps active development is no longer of key importance. Also, test-framework still has more supporting libraries than tasty. Migrating them should be easy but up till now no one has done it. So I’m not arguing heavily for tasty. This is more like an experiment to see how it works.

Categories: Offsite Blogs

UTP Symposium: extended submission deadline

General haskell list - Sun, 01/26/2014 - 3:56am
*** revised submission deadline *** ********************************************************************** 5th International Symposium on Unifying Theories of Programming co-located with FM2014 May 12 - 13, 2014 Singapore http://www.comp.nus.edu.sg/~pat/UTP2014/index.html ********************************************************************** CALL FOR PAPERS Interest in the fundamental problem of the combination of formal notations and theories of programming has grown consistently in recent years. The theories define, in various different ways, many common notions, such as abstraction, refinement, choice, termination, feasibility, locality, concurrency and communication. Despite these differences, such theories may be unified in a way which greatly facilitates their study and comparison. Moreover, such a unification offers a means of combining different languages describing various facets and artifacts o
Categories: Incoming News

Ian Ross: Haskell FFT 14: Wrap-up

Planet Haskell - Sun, 01/26/2014 - 3:26am
Haskell FFT 14: Wrap-up January 26, 2014

After thirteen blog articles and about two-and-a-half weeks’ equivalent work time spread out over three months, I think it’s time to stop with the FFT stuff for a while. I told myself when I started all this that if I could get within a factor of 20 of the performance of FFTW, I would be “quite obscenely pleased with myself”. For most N, the performance of our FFT code is within a factor of 5 or so of the performance of the vector-fftw package. For a pure Haskell implementation with some but not lots of optimisation work, that’s not too shabby.

But now, my notes are up to nearly 90 pages and, while there are definitely things left to do, I’d like to move on for a bit and do some other data analysis tasks. This post is just to give a quick round-up of what we’ve done, and to lay out the things remaining to be done (some of which I plan to do at some point and some others that I’d be very happy for other people to do!). I’ve released the latest version of the code to Hackage. The package is called arb-fft. There is also an index of all the posts in this series.

Lessons learned

The lessons I’ve taken from this exercise are all things that more experienced Haskellers have known for some time, but it’s interesting to see how things play out in a realistic example:

  • Haskell is a great language for writing clear code. The earlier versions of the FFT code were, in many cases, more or less direct transcriptions of my mental models of the way these algorithms should work. This isn’t such a big deal for a relatively simple example like the FFT, but when you’re dealing with more complicated problems, that clarity of expression is really valuable.

  • QuickCheck is a wonderful testing tool and Criterion is a wonderful benchmarking tool. But everyone knows that!

  • Clear code isn’t necessarily fast code, but when it comes time to optimise, Haskell allows you to isolate any “nastiness” you introduce. By this, I mean for example, that where I’ve fallen back on using stateful calculations and mutable vectors, the ST monad allows this stateful computation to be “locked away” so that no state leaks out into the surrounding pure code. This isolation makes reasoning about code much easier than in a comparable piece of procedural code — you know, because the type system guarantees it, that no references can leak from your stateful code: compare the situation in C++, where you need to be very careful to manage aliasing yourself because the language definition provides only limited scope to the compiler to help you.

  • Having a language that allows you to switch seamlessly between symbolic and numeric computation is nice. Quite a lot of the planning code is classic combinatorial code (multiset permutations and compositions of integers), and this can be expressed very clearly and easily in a functional language. The same goes for the “toy algebra system” code in the earlier articles where we used a limited set of algebraic structures to help understand the Cooley-Tukey decomposition of the Fourier matrix. And although these things are quick and easy to write, they can then be carried forward into more efficient representations.

Another lesson, not confined to Haskell or functional programming: good performance comes first from good algorithms, not from tweaking your code. There’s no point in optimising an O(N2) algorithm if it’s known that there’s an O(NlogN) algorithm out there. If you do that, you’re just generating waste heat in your CPU.

A related point is that good performance quite often requires you to look beyond “textbook” algorithms. I’ve yet to find a textbook reference to Rader’s algorithm or any of the other prime-length FFT algorithms. I don’t know what the more recent editions of Numerical Recipes say, but my copy of the second edition, in reference to using the fast Fourier transform for anything other than power-of-two lengths, basically just says “Don’t do that.” If you want fast, you need to look around a bit, read some literature, and implement good algorithms. Code tweaking comes later!

Remaining tasks

These are rated from 1-5 for Difficulty, Effort, Value and Fun:

Haskell genfft for “bottomlets” (D4/E4/V5/F4)

Writing a Haskell version of FFTW’s genfft utility for producing the optimised straight-line “codelets” used for the base transforms would be interesting, and it would remove all the embarrassing things in Numeric.FFT.Special, which are copied more or less wholesale from FFTW1. And once there’s a Haskell genfft, it should be possible to generate truly optimal plans at compile-time by running the genfft code as Template Haskell.

Haskell genfft for “twiddlets” (D4/E3/V5/F3)

I’ve only written specialised straight-line code for what I’ve been calling “bottomlets”, i.e. the base transforms used at the bottom of the recursive decomposition of the input vector. The intermediate Danielson-Lanczos steps all use generic unspecialised code. It ought to be possible to produce specialised versions of the Danielson-Lanczos steps for specific sub-vector lengths, which might give good speed-ups for some input lengths. Once there’s a genfft for “bottomlets”, it ought to be relatively straightforward to extend that to “twiddlets” as well.

Truly optimal planning (D3/E3/V3/F3)

Having a Haskell genfft for both “bottomlets” and “twiddlets” opens up the possibility of doing what I’d call truly optimal planning, in the sense that, as well as testing different plans as is done now, it would also be possible to construct specialised straight-line codelets for custom input sizes. Need a custom FFT of length 437? That’s 19×23, so we could construct (at compile-time) bottomlets and twiddlets of sizes 19 and 23 and benchmark to see which ordering is quicker, avoiding the use of Rader’s algorithm for most prime factors. Similarly, for more composite input lengths, we could try various combinations of custom bottomlet and twiddlet sizes, instead of relying only on a pre-selected list of sizes for specialisation.

Low-level optimisation (D4/E2/V5/F2)

I’ve really done relatively little optimisation on the arb-fft code: using unboxed mutable vectors is about as far as it goes. The empirical plan generation also helps quite a bit, but there is definitely more that could be done in terms of lower-level optimisation. I’ve not even looked at strictness, and haven’t even bothered with more profiling after optimisation, so there are probably a few things that could be done to squeeze some more performance out. Low-level optimisation of Haskell isn’t really something that I know a whole lot about, so I’ve left it to one side for now.

(There’s another optimisation task that really ought to be done too: for large problem sizes, planning can take a long time, particularly for problem sizes with large prime factors. I’ve added some code to filter out ridiculous plans where possible, i.e. ones that are guaranteed to be slow because of huge Danielson-Lanczos steps, but it’s not always possible to avoid looking at all of these, and they’re unavoidably slow. Better heuristics for planning and switching to a simplified timing approach instead of using Criterion would speed this up a lot.)

Real-to-real, etc. FFTs (D3/E4/V4/F0)

So far, I’ve only implemented complex-to-complex FFTs. For a production library, I’d also need to implement real-to-complex transforms along with the various flavours of real-to-real transforms (cosine, sine and Hartley). None of these things are terribly difficult to do, but they are really pretty tedious exercises in data management. No fun at all!

Parallelisation (D3/E2/V3/F2)

There are a lot of cheap opportunities for parallelisation in the divide-and-conquer approach of the Cooley-Tukey algorithm, and this is a place where it might be quite easy to get performance that exceeds that of FFTW (by cheating, of course — getting single processor performance that’s competitive with FFTW would take some doing!).


Of these tasks, I’ll probably have a go at the Haskell genfft stuff at some point and I might have a bit of a poke at some more optimisation efforts. I don’t think I’ll do the real-to-real FFTs unless someone finds themselves in desperate need of a pure Haskell FFT library with real-to-real transforms and offers to pay me for it! Parallelisation probably isn’t worth the effort unless single-processor performance can be made more competitive with FFTW.

Now though, it’s time for a break from FFT endeavours. Although today, I just learnt about this very interesting looking paper…

  1. Extra embarrassing because I made a mistake in translating the biggest and most complicated of these things to Haskell and it took ages to track down the (one-character) typo. This was in a single function with around 800 local variables, all called things like t1Y or t2t or t2T. A Haskell genfft would be nice.

data-analysis haskell <script src="http://zor.livefyre.com/wjs/v3.0/javascripts/livefyre.js" type="text/javascript"></script> <script type="text/javascript"> (function () { var articleId = fyre.conv.load.makeArticleId(null); fyre.conv.load({}, [{ el: 'livefyre-comments', network: "livefyre.com", siteId: "290329", articleId: articleId, signed: false, collectionMeta: { articleId: articleId, url: fyre.conv.load.makeCollectionUrl(), } }], function() {}); }()); </script>
Categories: Offsite Blogs

Ian Ross: Haskell FFT 14: Wrap-up

Planet Haskell - Sun, 01/26/2014 - 3:26am
Haskell FFT 14: Wrap-up January 26, 2014

After thirteen blog articles and about two-and-a-half weeks’ equivalent work time spread out over three months, I think it’s time to stop with the FFT stuff for a while. I told myself when I started all this that if I could get within a factor of 20 of the performance of FFTW, I would be “quite obscenely pleased with myself”. For most N, the performance of our FFT code is within a factor of 5 or so of the performance of the vector-fftw package. For a pure Haskell implementation with some but not lots of optimisation work, that’s not too shabby.

But now, my notes are up to nearly 90 pages and, while there are definitely things left to do, I’d like to move on for a bit and do some other data analysis tasks. This post is just to give a quick round-up of what we’ve done, and to lay out the things remaining to be done (some of which I plan to do at some point and some others that I’d be very happy for other people to do!). I’ve released the latest version of the code to Hackage. The package is called arb-fft. There is also an index of all the posts in this series.

Lessons learned

The lessons I’ve taken from this exercise are all things that more experienced Haskellers have known for some time, but it’s interesting to see how things play out in a realistic example:

  • Haskell is a great language for writing clear code. The earlier versions of the FFT code were, in many cases, more or less direct transcriptions of my mental models of the way these algorithms should work. This isn’t such a big deal for a relatively simple example like the FFT, but when you’re dealing with more complicated problems, that clarity of expression is really valuable.

  • QuickCheck is a wonderful testing tool and Criterion is a wonderful benchmarking tool. But everyone knows that!

  • Clear code isn’t necessarily fast code, but when it comes time to optimise, Haskell allows you to isolate any “nastiness” you introduce. By this, I mean for example, that where I’ve fallen back on using stateful calculations and mutable vectors, the ST monad allows this stateful computation to be “locked away” so that no state leaks out into the surrounding pure code. This isolation makes reasoning about code much easier than in a comparable piece of procedural code — you know, because the type system guarantees it, that no references can leak from your stateful code: compare the situation in C++, where you need to be very careful to manage aliasing yourself because the language definition provides only limited scope to the compiler to help you.

  • Having a language that allows you to switch seamlessly between symbolic and numeric computation is nice. Quite a lot of the planning code is classic combinatorial code (multiset permutations and compositions of integers), and this can be expressed very clearly and easily in a functional language. The same goes for the “toy algebra system” code in the earlier articles where we used a limited set of algebraic structures to help understand the Cooley-Tukey decomposition of the Fourier matrix. And although these things are quick and easy to write, they can then be carried forward into more efficient representations.

Another lesson, not confined to Haskell or functional programming: good performance comes first from good algorithms, not from tweaking your code. There’s no point in optimising an O(N2) algorithm if it’s known that there’s an O(NlogN) algorithm out there. If you do that, you’re just generating waste heat in your CPU.

A related point is that good performance quite often requires you to look beyond “textbook” algorithms. I’ve yet to find a textbook reference to Rader’s algorithm or any of the other prime-length FFT algorithms. I don’t know what the more recent editions of Numerical Recipes say, but my copy of the second edition, in reference to using the fast Fourier transform for anything other than power-of-two lengths, basically just says “Don’t do that.” If you want fast, you need to look around a bit, read some literature, and implement good algorithms. Code tweaking comes later!

Remaining tasks

These are rated from 1-5 for Difficulty, Effort, Value and Fun:

Haskell genfft for “bottomlets” (D4/E4/V5/F4)

Writing a Haskell version of FFTW’s genfft utility for producing the optimised straight-line “codelets” used for the base transforms would be interesting, and it would remove all the embarrassing things in Numeric.FFT.Special, which are copied more or less wholesale from FFTW1. And once there’s a Haskell genfft, it should be possible to generate truly optimal plans at compile-time by running the genfft code as Template Haskell.

Haskell genfft for “twiddlets” (D4/E3/V5/F3)

I’ve only written specialised straight-line code for what I’ve been calling “bottomlets”, i.e. the base transforms used at the bottom of the recursive decomposition of the input vector. The intermediate Danielson-Lanczos steps all use generic unspecialised code. It ought to be possible to produce specialised versions of the Danielson-Lanczos steps for specific sub-vector lengths, which might give good speed-ups for some input lengths. Once there’s a genfft for “bottomlets”, it ought to be relatively straightforward to extend that to “twiddlets” as well.

Truly optimal planning (D3/E3/V3/F3)

Having a Haskell genfft for both “bottomlets” and “twiddlets” opens up the possibility of doing what I’d call truly optimal planning, in the sense that, as well as testing different plans as is done now, it would also be possible to construct specialised straight-line codelets for custom input sizes. Need a custom FFT of length 437? That’s 19×23, so we could construct (at compile-time) bottomlets and twiddlets of sizes 19 and 23 and benchmark to see which ordering is quicker, avoiding the use of Rader’s algorithm for most prime factors. Similarly, for more composite input lengths, we could try various combinations of custom bottomlet and twiddlet sizes, instead of relying only on a pre-selected list of sizes for specialisation.

Low-level optimisation (D4/E2/V5/F2)

I’ve really done relatively little optimisation on the arb-fft code: using unboxed mutable vectors is about as far as it goes. The empirical plan generation also helps quite a bit, but there is definitely more that could be done in terms of lower-level optimisation. I’ve not even looked at strictness, and haven’t even bothered with more profiling after optimisation, so there are probably a few things that could be done to squeeze some more performance out. Low-level optimisation of Haskell isn’t really something that I know a whole lot about, so I’ve left it to one side for now.

(There’s another optimisation task that really ought to be done too: for large problem sizes, planning can take a long time, particularly for problem sizes with large prime factors. I’ve added some code to filter out ridiculous plans where possible, i.e. ones that are guaranteed to be slow because of huge Danielson-Lanczos steps, but it’s not always possible to avoid looking at all of these, and they’re unavoidably slow. Better heuristics for planning and switching to a simplified timing approach instead of using Criterion would speed this up a lot.)

Real-to-real, etc. FFTs (D3/E4/V4/F0)

So far, I’ve only implemented complex-to-complex FFTs. For a production library, I’d also need to implement real-to-complex transforms along with the various flavours of real-to-real transforms (cosine, sine and Hartley). None of these things are terribly difficult to do, but they are really pretty tedious exercises in data management. No fun at all!

Parallelisation (D3/E2/V3/F2)

There are a lot of cheap opportunities for parallelisation in the divide-and-conquer approach of the Cooley-Tukey algorithm, and this is a place where it might be quite easy to get performance that exceeds that of FFTW (by cheating, of course — getting single processor performance that’s competitive with FFTW would take some doing!).


Of these tasks, I’ll probably have a go at the Haskell genfft stuff at some point and I might have a bit of a poke at some more optimisation efforts. I don’t think I’ll do the real-to-real FFTs unless someone finds themselves in desperate need of a pure Haskell FFT library with real-to-real transforms and offers to pay me for it! Parallelisation probably isn’t worth the effort unless single-processor performance can be made more competitive with FFTW.

Now though, it’s time for a break from FFT endeavours. Although today, I just learnt about this very interesting looking paper…

  1. Extra embarrassing because I made a mistake in translating the biggest and most complicated of these things to Haskell and it took ages to track down the (one-character) typo. This was in a single function with around 800 local variables, all called things like t1Y or t2t or t2T. A Haskell genfft would be nice.

data-analysis haskell <script src="http://zor.livefyre.com/wjs/v3.0/javascripts/livefyre.js" type="text/javascript"></script> <script type="text/javascript"> (function () { var articleId = fyre.conv.load.makeArticleId(null); fyre.conv.load({}, [{ el: 'livefyre-comments', network: "livefyre.com", siteId: "290329", articleId: articleId, signed: false, collectionMeta: { articleId: articleId, url: fyre.conv.load.makeCollectionUrl(), } }], function() {}); }()); </script>
Categories: Offsite Blogs

Ian Ross: Haskell FFT 13: Optimisation Part 3

Planet Haskell - Sun, 01/26/2014 - 3:11am
Haskell FFT 13: Optimisation Part 3 January 26, 2014

In this article, we’ll finish up with optimisation, tidying up a couple of little things and look at some “final” benchmarking results. The code described here is the pre-release-4 version in the GitHub repository. I’m not going to show any code snippets in this section, for two reasons: first, if you’ve been following along, you should be familiar enough with how things work to find the sections of interest; and second, some of the optimisation means that the code is getting a little more unwieldy, and it’s getting harder to display what’s going on in small snippets. Look in the GitHub repository if you want to see the details.

Rader’s algorithm improvements

There are two things we can do to make our implementation of Rader’s algorithm go a bit more quickly.

First, so far we’ve always calculated the convolution involved in Rader’s algorithm using a transform for the next power-of-two size greater than the actual convolution size. This is based on the observation that powers of two usually have a pretty efficient FFT plan. It also allows us to avoid recursive calls to Rader’s algorithm if we end up with a convolution transform size that has large prime factors. However, obviously there’s no reason to expect that the padded convolution will be faster in all cases. Sometimes the unpadded convolution may be quicker because of specialised base transforms (e.g. an N=22 FFT takes about 3 μs while the power of two needed for the corresponding padded convolution, N=64, takes about 33 μs). As in other cases like this, the best way to deal with the question is to test empirically which is quicker.

Unfortunately, in order to do this, we’re going to have to change the types of some of the planning functions to run in the IO monad. This is because we’re going to have to recursively benchmark different plans for the convolution transforms while we’re deciding just how to do the Rader’s algorithm base transform. This isn’t all that messy when you get down to it, but it made me decide that the default planning API should also be monadic, with signature plan :: Int -> IO Plan, since there may be reads and writes of wisdom files going on during the planning process. To make it easy to decide which convolution approach to take, we also record the execution times of the selected transform plans in the wisdom files, so once a plan has been generated for a given input length, we can just read the timing information from the wisdom file.

So, in principle, what we do is quite simple. When we’re constructing a Rader base transform, we just determine whether the unpadded or padded transform for the convolution is quicker. We save the relevant FFT-transformed “b” sequence for the convolution and record the transform size that we’re using for the convolution. We can then set the code up so that everything works the same for the padded and unpadded case without too much trouble.

The second thing we can do is to try to avoid conversions between mutable and immutable vectors in the RaderBase branch of applyBase. Doing this effectively ends up being slightly messy, because what we really need is a specialised version of execute that works on mutable vectors. I call this executeM and it has signature

executeM :: Plan -> Direction -> MVCD s -> MVCD s -> ST s ()

taking a plan, a transform direction, input and output mutable vectors, and running in the ST monad. This allows us to avoid a couple of vector allocations in the convolution step. The original convolution code in applyBase looked like this:

let conv = execute cplan Inverse $ zipWith (*) (execute cplan Forward apadfr) (if sign == 1 then bfwd else binv)

which did at least two vector allocations, one for each call to execute, and possibly a third associated with the call to zipWith (though that may possibly get fused away). Instead of this, we can use mutable vectors and the new executeM to avoid some of these allocations (in particular, the multiplication involved in the convolution can be done in place), giving a slight speed-up in some cases.

There are a couple of other possibilities that we might explore, but we’re getting to a point of diminishing returns with this so we’ll leave it for now. (One particular possibility is to experiment with padding to other sizes instead of a power of two: any highly composite size that takes advantage of specialised base transforms would be good, and the best option could be determined empirically.)

Compiler flags

All the way through, we’ve been using GHC’s -O2 flag to enable the full suite of “non-dangerous” optimisations. Now, without getting into lots of messing around with specific GHC optimisation flags (of which there are lots), the best option for getting some more “free” performance improvements is to try a different code generator. Specifically, so far we’ve been using GHC’s native code generator, but there’s another LLVM-based generator that tends to give good performance for numerically-intensive code that makes a lot of use of Vectors. Sounds perfect for us.

We can enable the LLVM code generator simply by adding the -fllvm option to the ghc-options field in our Cabal file. Benchmarking with this version of the code reveals that we get, on average a 17% speed-up compared to the native code generator (for code version pre-release-4). Here’s a histogram of the speed-ups of the LLVM-generated code compared to that from the native code generator for input vector lengths from 8 to 1024:

<plot-data cols="ratio" format="csv" name="llvm" src="/blog/posts/2014/01/26/data-analysis-fft-13/llvm-ratio.dat"> </plot-data>

<plot aspect="1.6" axis-x-label="LLVM/Native execution time ratio" axis-y-label="Frequency" bar-width="-1px" fill="blue" fill-opacity="0.3" font-size="16" stroke="none" width="800"> <bars hist="[[histogram(llvm.ratio, 50)]]" x="[[hist.centres]]" y="[[hist.counts]]"></bars> </plot>

Final benchmarking

We can benchmark our final pre-release code version (pre-release-4) in the same way as we’ve done before. Here are the results shown in the usual way:

<plot-data format="csv" name="dat" separator=" " src="/blog/posts/2014/01/26/data-analysis-fft-13/benchmark-5.dat"> </plot-data>

<plot aspect="1.6" axis-x-end-tick-size="0" axis-x-label="Input length" axis-x-ticks="[[[[8,'8'],[16,'16'],[32,'32'],[64,'64'],[128,'128'],[256,'256'],[512,'512'],[1024,'1024']]]]" axis-x-transform="log" axis-y-label="Time" axis-y-ticks="[[[[1,'1 μs'],[10,'10 μs'],[100,'100 μs']]]]" axis-y-transform="log" font-size="16" oxs="[[seqStep(8,1024,4)]]" range-x="8,1100" ui-axis-x-transform="ui-axis-x-transform" ui-axis-y-transform="ui-axis-y-transform" width="800"> <plot-options stroke="black" stroke-width="1.5"> <lines label="O(N log N)" x="[[oxs]]" y="[[0.02*x*log(x)]]"></lines> <lines x="[[oxs]]" y="[[0.01*x*log(x)]]"></lines> </plot-options> <plot-options stroke-width="2"> <lines label="FFT" stroke="green" x="[[dat.Size]]" y="[[dat.FFT]]"></lines> <lines label="FFTW" stroke="blue" x="[[dat.Size]]" y="[[dat.FFTW]]"></lines> <plot-options> </plot>

and here are the ratio plots showing the relative performance of the original unoptimised version of the code (pre-release-1), the current version (pre-release-4) and FFTW:

<plot-data format="csv" name="ratios" src="/blog/posts/2014/01/26/data-analysis-fft-13/ratios.dat"> </plot-data>

<plot-stack aspect="1.6" width="800"> <plot axis-x-label="FFT/FFTW execution time ratio" axis-y-label="Frequency" bar-width="-1px" fill="blue" fill-opacity="0.3" font-size="16" stroke="none" title="pre-release-1"> <bars hist="[[histogram(ratios.rel2, 50)]]" x="[[hist.centres]]" y="[[hist.counts]]"></bars> </plot> <plot axis-x-label="FFT/FFTW execution time ratio" axis-y-label="Frequency" bar-width="-1px" fill="blue" fill-opacity="0.3" font-size="16" stroke="none" title="pre-release-4"> <bars hist="[[histogram(ratios.rel4, 50)]]" x="[[hist.centres]]" y="[[hist.counts]]"></bars> </plot> <plot axis-x-label="Execution time speed-up" axis-y-label="Frequency" bar-width="-1px" fill="blue" fill-opacity="0.3" font-size="16" stroke="none" title="Speed-up"> <bars hist="[[histogram(ratios.speedup, 50)]]" x="[[hist.centres]]" y="[[hist.counts]]"></bars> </plot> </plot-stack>

The pre-release-4 tab is the one to look at: for most input sizes we’re within a factor of 4-8 of the performance of the vector-fftw package. To be exact, the execution times for our code are between 1.40 and 13.54 slower than vector-fftw, with a median of 6.12 and quartiles of 4.98 (25%) and 7.50 (75%). To my mind, that’s pretty amazing: FFTW is an immensely clever piece of software and I really didn’t expect to get that close to it. To be sure, we’ve done quite a bit of work to get to this point, but it’s still significantly less than the investment of time that’s gone into the development of FFTW. I have a feeling that we’ve eaten most of the low-hanging fruit, and getting that last factor of five or so speed-up to become competitive with FFTW would be pretty hard.

I’ve also done benchmarking measurements with this optimised code for larger input sizes: a selection of about 100 input lengths between 210 and 218. The results are shown below: they are similar to those for smaller input sizes, except that there is a longer tail to larger ratios: the maximum here is 15.22 — the median is 5.99 and the quartiles are very similar to those for the smaller input sizes.

<plot-data format="csv" name="big" separator=" " src="/blog/posts/2014/01/26/data-analysis-fft-13/benchmark-5-big.dat"> </plot-data>

<plot aspect="1.6" axis-x-end-tick-size="0" axis-x-label="Input length" axis-x-ticks="[[[[1024,'1K'],[4096,'4K'],[16384,'16K'],[65536,'64K'],[262144,'256K']]]]" axis-x-transform="log" axis-y-label="Time" axis-y-ticks="[[[[1,'1 μs'],[10,'10 μs'],[100,'100 μs']]]]" axis-y-transform="log" font-size="16" oxs="[[seqStep(1024,262144,256)]]" range-x="1024,262144" ui-axis-x-transform="ui-axis-x-transform" ui-axis-y-transform="ui-axis-y-transform" width="800"> <plot-options stroke="black" stroke-width="1.5"> <lines label="O(N log N)" x="[[oxs]]" y="[[0.02*x*log(x)]]"></lines> <lines x="[[oxs]]" y="[[0.01*x*log(x)]]"></lines> </plot-options> <plot-options stroke-width="2"> <lines label="FFT" stroke="green" x="[[big.Size]]" y="[[big.FFT]]"></lines> <lines label="FFTW" stroke="blue" x="[[big.Size]]" y="[[big.FFTW]]"></lines> <plot-options> </plot>

<plot-data format="csv" name="bigratios" src="/blog/posts/2014/01/26/data-analysis-fft-13/ratios-big.dat"> </plot-data>

<plot aspect="1.6" axis-x-label="FFT/FFTW execution time ratio" axis-y-label="Frequency" bar-width="-1px" fill="blue" fill-opacity="0.3" font-size="16" stroke="none" width="800"> <bars hist="[[histogram(ratios.rel4, 50)]]" x="[[hist.centres]]" y="[[hist.counts]]"></bars> </plot>

This shows that we’re pretty successful at maintaining good O(NlogN) scaling for larger input sizes, which is nice.

In the next and last article, we’ll wrap-up with some comments on what we’ve done, the lessons learned, and things left to do.

data-analysis haskell <script src="http://zor.livefyre.com/wjs/v3.0/javascripts/livefyre.js" type="text/javascript"></script> <script type="text/javascript"> (function () { var articleId = fyre.conv.load.makeArticleId(null); fyre.conv.load({}, [{ el: 'livefyre-comments', network: "livefyre.com", siteId: "290329", articleId: articleId, signed: false, collectionMeta: { articleId: articleId, url: fyre.conv.load.makeCollectionUrl(), } }], function() {}); }()); </script>
Categories: Offsite Blogs

2nd Call for Papers: ICECCS 2014, Tianjin, China,4-7 Aug 2014

General haskell list - Sat, 01/25/2014 - 6:48pm
[We apologize for multiple copies.] ============================================================ Call for Papers ICECCS 2014 (The 19th IEEE International Conference on Engineering of Complex Computer Systems) 4-7 August 2014 School of Computer Science and Technology Tianjin University, China http://cs.tju.edu.cn/iceccs/2014 ============================================================ IMPORTANT DATES --------------------------- * Abstract submission: February 14, 2014 * Paper submission deadline: February 28, 2014 * Workshop proposal submission: February 28, 2014 * Notification of acceptance: April 22, 2014 * Camera-ready material for publication: May 4, 2014 * Checking for Production: May 14, 2014 * Conference date: August 4-7, 2014 Complex computer systems are c
Categories: Incoming News

Call for Papers: ICFEM 2014, Luxembourg, 3-7 November 2014

General haskell list - Sat, 01/25/2014 - 6:36pm
[We apologize for multiple copies.] ============================================================ Call for Papers ICFEM 2014 16th International Conference on Formal Engineering Methods Luxembourg, 3-7 November 2014 http://icfem2014.uni.lu ============================================================ The 16th International Conference on Formal Engineering Methods (ICFEM 2014) will be held at the Melia Hotel in Luxembourg, Luxembourg from 3rd November to 7 November 2014. Since 1997, ICFEM has been serving as an international forum for researchers and practitioners who have been seriously applying formal methods to practical applications. Researchers and practitioners, from industry, academia, and government, are encouraged to attend, present their research, and help advance the state of the art. We are interested in work that has been incorpo
Categories: Incoming News

Demystifying DList

Haskell on Reddit - Sat, 01/25/2014 - 4:54pm
Categories: Incoming News

Question about program flow and FreeT with State

Haskell on Reddit - Sat, 01/25/2014 - 1:45pm

I'm developing a text adventure game and I noticed that using a Free transformer would allow me to do two things with each supplied user command:

  1. Print out the command
  2. Modify the game's state based on the user's command

...with a single free monad.

I found this to be very interesting so I'm really trying to get this working, since it seems like a very elegant solution. I'm hitting a wall, though, so I thought I'd ask for help.

Basically, all of the computation takes place in a function called game, which I will actually copy/paste here. It's readable enough as is since it's the highest level function in the program:

game :: StateT GameState (FreeT TextlunkyCommand IO) () game = forever $ do gs <- get --| get gamestate lift . lift $ showRoom gs --| print a description of room modify (interactGame prompt) --| process player command modify stepGame --| step each entity in the game lift prompt --| get a command for printing

and then in main:

main = do -- | GameState will eventually be generated by a random seed showTIO $ evalStateT game (gs :: GameState)

This all typechecks and works. However, I don't know how to process an event where the player tries to do something they can't do.

For example, the player will have a limited amount of ropes to use. If they run out of ropes, I don't want to print "you throw a rope", I want to print "you don't have any ropes!". But, showTIO just runs a free command through a printing process without any access to state. How can I process everything in a way that makes it so the printing, not only the state update, has access to the global game state?

Thanks!

PS: Here is the full code is on github. It's still in very early stages of development, so most of it isn't done. The code mentioned is in Types/Game.hs and textlunky.hs. Any help would be appreciated!

submitted by 5outh
[link] [6 comments]
Categories: Incoming News