News aggregator

GaloisInc/HaLVM

del.icio.us/haskell - Sat, 01/11/2014 - 2:12pm
Categories: Offsite Blogs

Getting Started with Haskell

del.icio.us/haskell - Sat, 01/11/2014 - 1:54am
Categories: Offsite Blogs

Getting Started with Haskell

del.icio.us/haskell - Sat, 01/11/2014 - 1:54am
Categories: Offsite Blogs

5th Answer Set Programming Competition 2014 - FIRST CALLFOR PARTICIPANTS

General haskell list - Fri, 01/10/2014 - 7:12pm
[apologies for any cross-posting] ======================================================================== ================================================================== 5th Answer Set Programming Competition 2014 Call for Participant Systems Aalto University, University of Calabria, University of Genova Spring/Summer 2014 http://aspcomp2014.mat.unical.it/ aspcomp2014< at >mat.unical.it ================================================================== Special edition of the ASP competition series -system track- part of the Olympic Games of the Vienna Summer of Logic 2014 == Important Dates == * March 1st, 2014 : Participant registration opens * March 31st, 2014: The competition starts * July 22nd, 2014 : Awards are presented at FLoC ======================================================================== Answer Set Programming (ASP) is a well-established paradigm of declarat
Categories: Incoming News

Fun in the Afternoon at Facebook London: 12 March 2014

General haskell list - Fri, 01/10/2014 - 4:52pm
[ please forward this to anyone who might be interested ] Fun in the Afternoon is a regular(ish) event consisting of talks about Functional Programming and related topics. The next one will be held at the Facebook offices in London on 12 March 2014 As with other Fun in the Afternoon events in the past (see http://sneezy.cs.nott.ac.uk/fun/) the plan is to have an afternoon of talks and meet up with fellow FPers, probably followed by some drinking and food. Facebook is located in Covent Garden, and is easily reachable by public transport (however, it is not so easily reachable by private transport!). I'll send out full details including the program and directions nearer the time. Anyone is welcome to attend - if you'd like to come along, just drop me an email. If you'd like to give a talk, please drop me an email containing a title and abstract. Talks will probably be about 30 minutes, depending on the number of talk proposals. The atmosphere is intended to be Fun and informal, so tal
Categories: Incoming News

Mark Jason Dominus: Moonpig: a billing system that doesn't suck

Planet Haskell - Fri, 01/10/2014 - 10:44am
#load std I'm in Amsterdam now, because Booking.com brought me out to tell them about Moonpig, the billing and accounting system that Rik Signes and I wrote. The talk was mostly a rehash of one I gave a Pittsburgh Perl Workshop a couple of months ago, but I think it's of general interest.

The assumption behind the talk is that nobody wants to hear about how the billing system actually works, because most people either have their own billing system already or else don't need one at all. I think I could do a good three-hour talk about the internals of Moonpig, and it would be very interesting to the right group of people, but it would be a small group. So instead I have this talk, which lasts less than an hour. The takeaway from this talk is a list of several basic design decisions that Rik and I made while building Moonpig which weren't obviously good ideas at the time, but which turned out well in hindsight. That part I think everyone can learn from. You may not ever need to write a billing system, but chances are at some point you'll consider using an ORM, and it might be useful to have a voice in your head that says “Dominus says it might be better to do something completely different instead. I wonder if this is one of those times?”

So because I think the talk was pretty good, and it's fresh in my mind right now, I'm going to try to write it down. The talk slides are here if you want to see them. The talk is mostly structured around a long list of things that suck, and how we tried to design Moonpig to eliminate, avoid, or at least mitigate these things.

Moonpig, however, does not suck.

Sometimes I see other people fuck up a project over and over, and I say “I could do that better”, and then I get a chance to try, and I discover it was a lot harder than I thought, I realize that those people who tried before are not as stupid as as I believed.

That did not happen this time. Moonpig is a really good billing system. It is not that hard to get right. Those other guys really were as stupid as I thought they were.

Brief explanation of IC Group When I tell people I was working for IC Group, they frown; they haven't heard of it. But quite often when I say that IC Group runs pobox.com, those same people smile and say “Oh, pobox!”.

ICG is a first wave dot-com. In the late nineties, people would often have email through their employer or their school, and then they would switch jobs or graduate and their email address would go away. The basic idea of pobox was that for a small fee, something like $15 per year, you could get a pobox.com address that would forward all your mail to your real email address. Then when you changed jobs or schools you could just tell pobox to change the forwarding record, and your friends would continue to send email to the same pobox.com address as before. Later, ICG offered mail storage, web mail, and, through listbox.com, mailing list management and bulk email delivery.

Moonpig was named years and years before the project to write it was started. ICG had a billing and accounting system already, a terrible one. ICG employees would sometimes talk about the hypothetical future accounting system that would solve all the problems of the current one. This accounting system was called Moonpig because it seemed clear that it would never actually be written, until pigs could fly.

And in fact Moonpig wouldn't have been written, except that the existing system severely constrained the sort of pricing structures and deals that could actually be executed, and so had to go. Even then the first choice was to outsource the billing and accounting functions to some company that specialized in such things. The Moonpig project was only started as a last resort after ICG's president had tried for 18 months to find someone to take over the billing and collecting. She was unsuccessful. A billing provider would seem perfect and then turn out to have some bizarre shortcoming that rendered it unsuitable for ICG's needs. The one I remember was the one that did everything we wanted, except it would not handle checks. “Don't worry,” they said. “It's 2010. Nobody pays by check any more.”

Well, as it happened, many of our customers, including some of the largest institutional ones, had not gotten this memo, and did in fact pay by check.

So with some reluctance, she gave up and asked Rik and me to write a replacement billing and accounting system.

As I mentioned, I had always wanted to do this. I had very clear ideas, dating back many years, about mistakes I would not make, were I ever called upon to write a billing system.

For example, I have many times received a threatening notice of this sort:

Your account is currently past due! Pay the outstanding balance of $      0 . 00   or we will be forced to refer your account for collection. What I believe happened here is: some idiot programmer knows that money amounts are formatted with decimal points, so decides to denominate the money with floats. The amount I paid rounds off a little differently than the amount I actually owed, and the result after subtraction is all roundoff error, and leaves me with a nominal debt on the order of dollars.

So I have said to myself many times “If I'm ever asked to write a billing system, it's not going to use any fucking floats.” And at the meeting at which the CEO told me and Rik that we would write it, those were nearly the first words out of my mouth: No fucking floats.

Moonpig conceptual architecture I will try to keep this as short as possible, including only as much as is absolutely required to understand the more interesting and generally applicable material later.

Pobox and Listbox accounts ICG has two basic use cases. One is Pobox addresses and mailboxes, where the customer pays us a certain amount of money to forward (or store) their mail for a certain amount of time, typically a year. The other is Listbox mailing lists, where the customer pays us a certain amount to attempt a certain number of bulk email deliveries on their behalf.

The basic model is simple… The life cycle for a typical service looks like this: The customer pays us some money: a flat fee for a Pobox account, or a larger or smaller pile for Listbox bulk mailing services, depending on how much mail they need us to send. We deliver service for a while. At some point the funds in the customer's account start to run low. That's when we send them an invoice for an extension of the service. If they pay, we go back and continue to provide service and the process repeats; if not, we stop providing the service.

…just like all basic models But on top of this basic model there are about 10,019 special cases:

  • Customers might cancel their service early.

  • Pobox has a long-standing deal where you get a sixth year free if you pay for five years of service up front.

  • Sometimes a customer with only email forwarding ($20 per year) wants to upgrade their account to one that does storage and provides webmail access ($50 per year), or vice-versa, in the middle of a year. What to do in this case? Business rules dictate that they can apply their current balance to the new service, and it should be properly pro-rated. So if I have 64 days of $50-per-year service remaining, and I downgrade to the $20-per-year service, I now have 160 days of service left.

    Well, that wasn't too bad, except that we should let the customer know the new expiration date. And also, if their service will now expire sooner than it would have, we should give them a chance to pay to extend the service back to the old date, and deal properly with their payment or nonpayment.

    Also something has to be done about any 6th free year that I might have had. We don't want someone to sign up for 5 years of $50-per-year service, get the sixth year free, then downgrade their account and either get a full free year of $50-per-year service or get a full free year of $20-per-year service after only of five full years.

  • Sometimes customers do get refunds.

  • Sometimes we screw up and give people a credit for free service, as an apology. Unlike regular credits, these are not refundable!

  • Some customers get gratis accounts. The other cofounder of ICG used to hand these out at parties.

  • There are a number of cases for coupons and discounts. For example, if you refer a friend who signs up, you get some sort of credit. Non-profit institutions get some sort of discount off the regular rates. Customers who pay for many accounts get some sort of bulk discount. I forget the details.

  • Most customers get their service cut off if they don't pay. Certain large and longstanding customers should not be treated so peremptorily, and are allowed to run a deficit.

  • And so to infinity and beyond.

Ledgers and Consumers The Moonpig data store is mostly organized as a huge pile of ledgers. Each represents a single customer or account. It contains some contact information, a record of all the transactions associated with that customer, a history of all the invoices ever sent to that customer, and so forth.

It also contains some consumer objects. Each consumer represents some service that we have promised to perform in exchange for money. The consumer has methods in it that you can call to say “I just performed a certain amount of service; please charge accordingly”. It has methods for calculating how much money has been allotted to it, how much it has left, how fast it is consuming its funds, how long it expects to last, and when it expects to run out of money. And it has methods for constructing its own replacement and for handing over control to that replacement when necessary.

Heartbeats Every day, a cron job sends a heartbeat event to each ledger. The ledger doesn't do anything with the heartbeat itself; its job is to propagate the event to all of its sub-components. Most of those, in turn, ignore the heartbeat event entirely.

But consumers do handle heartbeats. The consumer will wake up and calculate how much longer it expects to live. (For Pobox consumers, this is simple arithmetic; for mailing-list consumers, it guesses based on how much mail has been sent recently.) If it notices that it is going to run out of money soon, it creates a successor that can take over when it is gone. The successor immediately sends the customer an invoice: “Hey, your service is running out, do you want to renew?”

Eventually the consumer does run out of money. At that time it hands over responsibility to its replacement. If it has no replacement, it will expire, and the last thing it does before it expires is terminate the service.

Things that suck: manual repairs Somewhere is a machine that runs a daily cron job to heartbeat each ledger. What if one day, that machine is down, as they sometimes are, and the cron job never runs?

Or what if that the machine crashes while the cron job is running, and the cron job only has time to heartbeat 3,672 of the 10,981 ledgers in the system?

In a perfect world, every component would be able to depend on exactly one heartbeat arriving every day. We don't live in that world. So it was an ironclad rule in Moonpig development that anything that handles heartbeat events must be prepared to deal with missing heartbeats, duplicate heartbeats, or anything else that could screw up.

When a consumer gets a heartbeat, it must not cheerfully say "Oh, it's the dawn of a new day! I'll charge for a day's worth of service!". It must look at the current date and at its own charge record and decide on that basis whether it's time to charge for a day's worth of service.

Now the answers to those questions of a few paragraphs earlier are quite simple. What if the machine is down and the cron job never runs? What to do?

A perfectly acceptable response here is: Do nothing. The job will run the next day, and at that time everything will be up to date. Some customers whose service should have been terminated today will have it terminated tomorrow instead; they will have received a free day of service. This is an acceptable loss. Some customers who should have received invoices today will receive them tomorrow. The invoices, although generated and sent a day late, will nevertheless show the right dates and amounts. This is also an acceptable outcome.

What if the cron job crashes after heartbeating 3,672 of 10,981 ledgers? Again, an acceptable response is to do nothing. The next day's heartbeat will bring the remaining 7,309 ledgers up to date, after which everything will be as it should. And an even better response is available: simply rerun the job. 3,672 of the ledgers will receive the same event twice, and will ignore it the second time.

Contrast this with the world in which heartbeats were (mistakenly) assumed to be reliable. In this world, the programming staff must determine precisely which ledgers received the event before the crash, either by trawling through the log files or by grovelling over the ledger data. Then someone has to hack up a program to send the heartbeats to just the 7,309 ledgers that still need it. And there is a stiff deadline: they have to get it done before tomorrow's heartbeat issues!

Making everything robust in the face of heartbeat failure is a little more work up front, but that cost is recouped the first time something goes wrong with the heartbeat process, when instead of panicking you smile and open another beer. Let N be the number of failures and manual repairs that are required before someone has had enough and makes the heartbeat handling code robust. I hypothesize that you can tell a lot about an organization from the value of N.

Here's an example of the sort of code that is required. The non-robust version of the code would look something like this:

sub charge { my ($self, $event) = @_; $self->charge_one_day(); } The code, implemented by a role called Moonpig::Role::Consumer::ChargesPeriodically, actually looks something like this:

has last_charge_date => ( … ); sub charge { my ($self, $event) = @_; my $now = Moonpig->env->now; CHARGE: until ($self->next_charge_date->follows($now)) { my $next = $self->next_charge_date; $self->charge_one_day(); $self->last_charge_date($next); if ($self->is_expired) { $self->replacement->handle_event($event) if $self->replacement; last CHARGE; } } } The last_charge_date member records the last time the consumer actually issued a charge. The next_charge_date method consults this value and returns the next day on which the consumer should issue a charge—not necessarily the following day, since the consumer might issue weekly or monthly charges. The consumer will issue charge after charge until the next_charge_date is the future, when it will stop. It runs the until loop, using charge_one_day to issue another charge each time through, and updating last_charge_date each time, until the next_charge_date is in the future.

The one tricky part here the if block. This is because the consumer might run out of money before the loop completes. In that case it passes the heartbeat event on to its successor (replacement) and quits the loop. The replacement will run its own loop for the remaining period.

Things that suck: real-time testing A customer pays us $20. This will cover their service for 365 days. The business rules say that they should receive their first invoice 30 days before the current service expires; that is, after 335 days. How are we going to test that the invoice is in fact sent precisely 335 days later?

Well, put like that, the answer is obvious: Your testing system must somehow mock the time. But obvious as this is, I have seen many many tests that made some method call and then did sleep 60, waiting and hoping that the event they were looking for would have occurred by then, reporting a false positive if the system was slow, and making everyone that much less likely to actually run the tests.

I've also seen a lot of tests that crossed their fingers and hoped that a certain block of code would execute between two ticks of the clock, and that failed nondeterministically when that didn't happen.

So another ironclad law of Moonpig design was that no object is ever allowed to call the time() function to find out what time it actually is. Instead, to get the current time, the object must call Moonpig->env->now.

The tests run in a test environment. In the test environment, Moonpig->env returns a Moonpig::Env::Test object, which contains a fake clock. It has a stop_clock method that stops the clock, and an elapse_time method that forces the clock forward a certain amount. If you need to check that something happens after 40 days, you can call Moonpig->env->elapse_time(86_400 * 40), or, more likely:

for (1..40) { Moonpig->env->elapse_time(86_400); $test_ledger->heartbeat; } In the production environment, the environment object still has a now method, but one that returns the true current time from the system clock. Trying to stop the clock in the production environment is a fatal error.

Similarly, no Moonpig object ever interacts directly with the database; instead it must always go through the mediator returned by Moonpig->env->storage. In tests, this can be a fake storage object or whatever is needed. It's shocking how many tests I've seen that begin by allocating a new MySQL instance and executing a huge pile of DDL. Folks, this is not how you write a test.

Again, no Moonpig object ever posts email. It asks Moonpig->env->email_sender to post the email on its behalf. In tests, this uses the CPAN Email::Sender::Transport suite, and the test code can interrogate the email_sender to see exactly what emails would have been sent.

We never did anything that required filesystem access, but if we had, there would have been a Moonpig->env->fs for opening and writing files.

The Moonpig->env object makes this easy to get right, and hard to screw up. Any code that acts on the outside world becomes a red flag: Why isn't this going through the environment object? How are we going to test it?

Things that suck: floating-point numbers I've already complained about how I loathe floating-point numbers. I just want to add that although there are probably use cases for floating-point arithmetic, I don't actually know what they are. I've had a pretty long and varied programming career so far, and legitimate uses for floating point numbers seem very few. They are really complicated, and fraught with traps; I say this as a mathematical expert with a much stronger mathematical background than most programmers.

The law we adopted for Moonpig was that all money amounts are integers. Each money amount is an integral number of “millicents”, abbreviated “m¢”, worth of a cent, which in turn is of a U.S. dollar. Fractional millicents are not allowed. Division must be rounded to the appropriate number of millicents, usually in the customer's favor, although in practice it doesn't matter much, because the amounts are so small.

For example, a $20-per-year Pobox account actually bills

m¢ each day. (5464 in leap years.)

Since you don't want to clutter up the test code with a bunch of numbers like 1000000 ($10), there are two utterly trivial utility subroutines:

sub cents { $_[0] * 1000 } sub dollars { $_[0] * 1000 * 100 } Now $10 can be written dollars(10).

Had we dealt with floating-point numbers, it would have been tempting to write test code that looked like this:

cmp_ok(abs($actual_amount - $expected_amount), "<", $EPSILON, …); That's because with floats, it's so hard to be sure that you won't end up with a leftover or something, so you write all the tests to ignore small discrepancies. This can lead to overlooking certain real errors that happen to result in small discrepancies. With integer amounts, these discrepancies have nowhere to hide. It sometimes happened that we would write some test and the money amount at the end would be wrong by 2m¢. Had we been using floats, we might have shrugged and attributed this to incomprehensible roundoff error. But with integers, that is a difference of 2, and you cannot shrug it off. There is no incomprehensible roundoff error. All the calculations are exact, and if some integer is off by 2 it is for a reason. These tiny discrepancies usually pointed to serious design or implementation errors. (In contrast, when a test would show a gigantic discrepancy of a million or more m¢, the bug was always quite easy to find and fix.)

There are still roundoff errors; they are unavoidable. For example, a consumer for a $20-per-year Pobox account bills only 365·5479m¢ = 1999835m¢ per year, an error in the customer's favor of 165m¢ per account; after 12,121 years the customer will have accumulated enough error to pay for an extra year of service. For a business of ICG's size, this loss was deemed acceptable. For a larger business, it could be significant. (Imagine 6,000,000 customers times 165m¢ each; that's $9,900.) In such a case I would keep the same approach but denominate everything in micro-cents instead.

Happily, Moonpig did not have to deal with multiple currencies. That would have added tremendous complexity to the financial calculations, and I am not confident that Rik and I could have gotten it right in the time available.

Things that suck: dates and times Dates and times are terribly complicated, partly because the astronomical motions they model are complicated, and mostly because the world's bureaucrats keep putting their fingers in. It's been suggested recently that you can identify whether someone is a programmer by asking if they have an opinion on time zones. A programmer will get very red in the face and pound their fist on the table.

After I wrote that sentence, I then wrote 1,056 words about the right way to think about date and time calculations, which I'll spare you, for now. I'm going to try to keep this from turning into an article about all the ways people screw up date and time calculations, by skipping the arguments and just stating the main points:

  1. Date-time values are a kind of number, and should be considered as such. In particular:
    1. Date-time values inside a program should be immutable
    2. There should be a single canonical representation of date-time values in the program, and it should be chosen for ease of calculation.
  2. If the program does have to deal with date-time values in some other representation, it should convert them to the canonical representation as soon as possible, or from the canonical representation as late as possible, and in any event should avoid letting non-canonical values percolate around the program.
The canonical representation we chose was DateTime objects in UTC time. Requiring that the program deal only with UTC eliminates many stupid questions about time zones and DST corrections, and simplifies all the rest as much as they can be simplified. It also avoids DateTime's unnecessarily convoluted handling of time zones.

We held our noses when we chose to use DateTime. It has my grudging approval, with a large side helping of qualifications. The internal parts of it are okay, but the methods it provides are almost never what you actually want to use. For example, it provides a set of mutators. But, as per item 1 above, date-time values are numbers and ought to be immutable. Rik has a good story about a horrible bug that was caused when he accidentally called the ->subtract method on some widely-shared DateTime value and so mutated it, causing an unexpected change in the behavior of widely-separated parts of the program that consulted it afterward.

So instead of using raw DateTime, we wrapped it in a derived class called Moonpig::DateTime. This removed the mutators and also made a couple of other convenient changes that I will shortly describe.

Things that really really suck: DateTime::Duration If you have a pair of DateTime objects and you want to know how much time separates the two instants that they represent, you have several choices, most of which will return a DateTime::Duration object. All those choices are wrong, because DateTime::Duration objects are useless. They are a kind of Roach Motel for date and time information: Data checks into them, but doesn't check out. I am not going to discuss that here, because if I did it would take over the article, but I will show the simple example I showed in the talk:

my $then = DateTime->new( month => 4, day => 2, year => 1969, hour => 0, minute => 0, second => 0); my $now = DateTime->now(); my $elapsed = $now - $then; print $elapsed->in_units('seconds'), "\n"; You might think, from looking at this code, that it might print the number of seconds that elapsed between 1969-04-02 00:00:00 (in some unspecified time zone!) and the current moment. You would be mistaken; you have failed to reckon with the $elapsed object, which is a DateTime::Duration. Computing this object seems reasonable, but as far as I know once you have it there is nothing to do but throw it away and start over, because there is no way to extract from it the elapsed amount of time, or indeed anything else of value. In any event, the print here does not print the correct number of seconds. Instead it prints ME CAGO EN LA LECHE, which I have discovered is Spanish for “I shit in the milk”.

So much for DateTime::Duration. When a and b are Moonpig::DateTime objects, a-b returns the number of seconds that have elapsed between the two times; it is that simple. You can divide it by 86,400 to get the number of days.

Other arithmetic is similarly overloaded: If i is a number, then a+i and a-i are the times obtained by adding or subtracting i seconds to a, respectively.

(C programmers should note the analogy with pointer arithmetic; C's pointers, and date-time values—also temperatures—are examples of a mathematical structure called an affine space, and study of the theory of affine spaces tells you just what rules these objects should obey. I hope to discuss this at length another time.)

Going along with this arithmetic are a family of trivial convenience functions, such as:

sub hours { $_[0] * 3600 } sub days { $_[0] * 86400 } so that you can use $a + days(7) to find the time 7 days after $a. Programmers at the Amsterdam talk were worried about this: what about leap seconds? And they are correct: the name days is not quite honest, because it promises, but does not deliver, exactly 7 days. It can't, because the definition of the day varies widely from place to place and time to time, and not only can't you know how long 7 days unless you know where it is, but it doesn't even make sense to ask. That is all right. You just have to be aware, when you add days(7), the the resulting time might not be the same time of day 7 days later. (Indeed, if the local date and time laws are sufficiently bizarre, it could in principle be completely wrong. But since Moonpig::DateTime objects are always reckoned in UTC, it is never more than one second wrong.)

Anyway, I was afraid that Moonpig::DateTime would turn out to be a leaky abstraction, producing pleasantly easy and correct results thirty times out of thirty-one, and annoyingly wrong or bizarre results the other time. But I was surprised: it never caused a problem, or at least none has come to light. I am working on releasing this module to CPAN, under the name DateTime::Moonpig. (A draft version is already available, but I don't recommend that you use it.)

Things that suck: mutable data I left this out of the talk, by mistake, but this is a good place to mention it: mutable data is often a bad idea. In the billing system we wanted to avoid it for accountability reasons: We never wanted the customer service agent to be in the position of being unable to explain to the customer why we thought they owed us $28.39 instead of the $28.37 they claimed they owed; we never wanted ourselves to be in the position of trying to track down a billing system bug only to find that the trail had been erased.

One of the maxims Rik and I repeated freqently was that the moving finger writes, and, having writ, moves on. Moonpig is full of methods with names like is_expired, is_superseded, is_canceled, is_closed, is_obsolete, is_abandoned and so forth, representing entities that have been replaced by other entities but which are retained as part of the historical record.

For example, a consumer has a successor, to which it will hand off responsibility when its own funds are exhausted; if the customer changes their mind about their future service, this successor might be replaced with a different one, or replaced with none. This doesn't delete or destroy the old successor. Instead it marks the old successor as "superseded", simultaneously recording the supersession time, and pushes the new successor (or undef, if none) onto the end of the target consumer's replacement_history array. When you ask for the current successor, you are getting the final element of this array. This pattern appeared in several places. In a particularly simple example, a ledger was required to contain a Contact object with contact information for the customer to which it pertained. But the Contact wasn't simply this:

has contact => ( is => 'rw', isa => role_type( 'Moonpig::Role::Contact' ), required => 1, ); Instead, it was an array; "replacing" the contact actually pushed the new contact onto the end of the array, from which the contact accessor returned the final element:

has contact_history => ( is => 'ro', isa => ArrayRef[ role_type( 'Moonpig::Role::Contact' ) ], required => 1, traits => [ 'Array' ], handles => { contact => [ get => -1 ], replace_contact => 'push', }, );

Things that suck: relational databases Why do we use relational databases, anyway? Is it because they cleanly and clearly model the data we want to store? No, it's because they are lightning fast.

When your data truly is relational, a nice flat rectangle of records, each with all the same fields, RDBs are terrific. But Moonpig doesn't have much relational data. It basic datum is the Ledger, which has a bunch of disparate subcomponents, principally a heterogeneous collection of Consumer objects. And I would guess that most programs don't deal in relational data; Like Moonpig, they deal in some sort of object network.

Nevertheless we try to represent this data relationally, because we have a relational database, and when you have a hammer, you go around hammering everything with it, whether or not that thing needs hammering.

When the object model is mature and locked down, modeling the objects relationally can be made to work. But when the object model is evolving, it is a disaster. Your relational database schema changes every time the object model changes, and then you have to find some way to migrate the existing data forward from the old schema. Or worse, and more likely, you become reluctant to let the object model evolve, because reflecting that evolution in the RDB is so painful. The RDB becomes a ball and chain locked to your program's ankle, preventing it from going where it needs to go. Every change is difficult and painful, so you avoid change. This is the opposite of the way to design a good program. A program should be light and airy, its object model like a string of pearls.

In theory the mapping between the RDB and the objects is transparent, and is taken care of seamlessly by an ORM layer. That would be an awesome world to live in, but we don't live in it and we may never.

Things that really really suck: ORM software Right now the principal value of ORM software seems to be if your program is too fast and you need it to be slower; the ORM is really good at that. Since speed was the only benefit the RDB was providing in the first place, you have just attached two large, complex, inflexible systems to your program and gotten nothing in return.

Watching the ORM try to model the objects is somewhere between hilariously pathetic and crushingly miserable. Perl's DBIx::Class, to the extent it succeeds, succeeds because it doesn't even try to model the objects in the database. Instead it presents you with objects that represent database rows. This isn't because a row needs to be modeled as an object—database rows have no interesting behavior to speak of—but because the object is an access point for methods that generate SQL. DBIx::Class is not for modeling objects, but for generating SQL. I only realized this recently, and angrily shouted it at the DBIx::Class experts, expecting my denunciation to be met with rage and denial. But they just smiled with amusement. “Yes,” said the DBIx::Class experts on more than one occasion, “that is exactly correct.” Well then.

So Rik and I believe that for most (or maybe all) projects, trying to store the objects in an RDB, with an ORM layer mediating between the program and the RDB, is a bad, bad move. We determined to do something else. We eventually brewed our own object store, and this is the part of the project of which I'm least proud, because I believe we probably made every possible mistake that could be made, even the ones that everyone writing an object store should already know not to make.

For example, the object store has a method, retrieve_ledger, which takes a ledger's ID number, reads the saved ledger data from the disk, and returns a live Ledger object. But it must make sure that every such call returns not just a Ledger object with the right data, but the same object. Otherwise two parts of the program will have different objects to represent the same data, one part will modify its object, and the other part, looking at a different object, will not see the change it should see. It took us a while to figure out problems like this; we really did not know what we were doing.

What we should have done, instead of building our own object store, was use someone else's object store. KiokuDB is frequently mentioned in this context. After I first gave this talk people asked “But why didn't you use KiokuDB?” or, on hearing what we did do, said “That sounds a lot like KiokuDB”. I had to get Rik to remind me why we didn't use KiokuDB. We had considered it, and decided to do our own not for technical but for political reasons. The CEO, having made the unpleasant decision to have me and Rik write a new billing system, wanted to see some progress. If she had asked us after the first week what we had accomplished, and we had said “Well, we spent a week figuring out KiokuDB,” her head might have exploded. Instead, we were able to say “We got the object store about three-quarters finished”. In the long run it was probably more expensive to do it ourselves, and the result was certainly not as good. But in the short run it kept the customer happy, and that is the most important thing; I say this entirely in earnest, without either sarcasm or bitterness.

(On the other hand, when I ran this article by Rik, he pointed out that KiokuDB had later become essentially unmaintained, and that had we used it he would have had to become the principal maintainer of a large, complex system which which he did not help design or implement. The Moonpig object store may be technically inferior, but Rik was with it from the beginning and understands it thoroughly.)

Our object store All that said, here is how our object store worked. The bottom layer was an ordinary relational database with a single table. During the test phase this database was SQLite, and in production it was IC Group's pre-existing MySQL instance. The table had two fields: a GUID (globally-unique identifier) on one side, and on the other side a copy of the corresponding Ledger object, serialized with Perl's Storable module. To retrieve a ledger, you look it up in the table by GUID. To retrieve a list of all the ledgers, you just query the GUID field. That covers the two main use-cases, which are customer service looking up a customer's account history, and running the daily heartbeat job. A subsidiary table mapped IC Group's customer account numbers to ledger GUIDs, so that the storage engine could look up a particular customer's ledger starting from their account number. (Account numbers are actually associated with Consumers, but once you had the right ledger a simple method call to the ledger would retrieve the consumer object. But finding the right ledger required a table.) There were a couple of other tables of that sort, but overall it was a small thing.

There are some fine points to consider. For example, you can choose whether to store just the object data, or the code as well. The choice is clear: you must store only the data, not the code. Otherwise, you would have to update all the objects every time you make a code change such as a bug fix. It should be clear that this would discourage bug fixes, and that had we gone this way the project would have ended as a pile of smoking rubble. Since the code is not stored in the database, the object store must be responsible, whenever it loads an object, for making sure that the correct class for that object actually exists. The solution for this was that along with every object is stored a list of all the roles that it must perform. At object load time, if the object's class doesn't exist yet, the object store retrieves this list of roles (stored in a third column, parallel to the object data) and uses the MooseX::ClassCompositor module to create a new class that does those roles. MooseX::ClassCompositor was something Rik wrote for the purpose, but it seems generally useful for such applications.

Every once in a while you may make an upward-incompatible change to the object format. Renaming an object field is such a change, since the field must be renamed in all existing objects, but adding a new field isn't, unless the field is mandatory. When this happened—much less often than you might expect—we wrote a little job to update all the stored objects. This occurred only seven times over the life of the project; the update programs are all very short.

We did also make some changes to the way the objects themselves were stored: Booking.Com's Sereal module was released while the project was going on, and we switched to use it in place of Storable. Also one customer's Ledger object grew too big to store in the database field, which could have been a serious problem, but we were able to defer dealing with the problem by using gzip to compress the serialized data before storing it.

The relational database provides transactions The use of the RDB engine for the underlying storage got us MySQL's implementation of transactions and atomicity guarantees, which we trusted. This gave us a firm foundation on which to build the higher functions; without those guarantees you have nothing, and it is impossible to build a reliable system. But since they are there, we could build a higher-level transactional system on top of them.

For example, we used an opportunistic locking scheme to prevent race conditions while updating a single ledger. For performance reasons you typically don't want to force all updates to be done through a single process (although it can be made to work; see Rochkind's Advanced Unix Programming). In an optimistic locking scheme, you store a version number with each record. Suppose you are the low-level storage manager and you get a request to update a ledger with a certain ID. Instead of doing this:

update ledger set serialized_data = … where ledger_id = 789 You do this:

update ledger set serialized_data = … , version = 4 where ledger_id = 789 and version = 3 and you check the return value from the SQL to see how many records were actually updated. The answer must be 0 or 1. If it is 1, all is well and you report the successful update back to your caller. But if it is 0, that means that some other process got there first and updated the same ledger, changing its version number from the 3 you were expecting to something bigger. Your changes are now in limbo; they were applied to a version of the object that is no longer current, so you throw an exception.

But is the exception safe? What if the caller had previously made changes to the database that should have been rolled back when the ledger failed to save? No problem! We had exposed the RDB transactions to the caller, so when the caller requested that a transaction be begun, we propagated that request into the RDB layer. When the exception aborted the caller's transaction, all the previous work we had done on its behalf was aborted back to the start of the RDB transaction, just as one wanted. The caller even had the option to catch the exception without allowing it to abort the RDB transaction, and to retry the failed operation.

Drawbacks of the object store The major drawback of the object store was that it was very difficult to aggregate data across ledgers: to do it you have to thaw each ledger, one at a time, and traverse its object structure looking for the data you want to aggregate. We planned that when this became important, we could have a method on the Ledger or its sub-objects which, when called, would store relevant numeric data into the right place in a conventional RDB table, where it would then be available for the usual SELECT and GROUP BY operations. The storage engine would call this whenever it wrote a modified Ledger back to the object store. The RDB tables would then be a read-only view of the parts of the data that were needed for building reports.

A related problem is some kinds of data really are relational and to store them in object form is extremely inefficient. The RDB has a terrible impedance mismatch for most kinds of object-oriented programming, but not for all kinds. The main example that comes to mind is that every ledger contains a transaction log of every transaction it has ever performed: when a consumer deducts its 5479 m¢, that's a transaction, and every day each consumer adds one to the ledger. The transaction log for a large ledger with many consumers can grow rapidly.

We planned from the first that this transaction data would someday move out of the ledger entirely into a single table in the RDB, access to which would be mediated by a separate object, called an Accountant. At present, the Accountant is there, but it stores the transaction data inside itself instead of in an external table.

The design of the object store was greatly simplified by the fact that all the data was divided into disjoint ledgers, and that only ledgers could be stored or retrieved. A minor limitation of this design was that there was no way for an object to contain a pointer to a Ledger object, either its own or some other one. Such a pointer would have spoiled Perl's lousy garbage collection, so we weren't going to do it anyway. In practice, the few places in the code that needed to refer to another ledger just store the ledger's GUID instead and looked it up when it was needed. In fact every significant object was given its own GUID, which was then used as needed. I was surprised to find how often it was useful to have a simple, reliable identifier for every object, and how much time I had formerly spent on programming problems that would have been trivially solved if objects had had GUIDs.

The object store was a success In all, I think the object store technique worked well and was a smart choice that went strongly against prevailing practice. I would recommend the technique for similar projects, except for the part where we wrote the object store ourselves instead of using one that had been written already. Had we tried to use an ORM backed by a relational database, I think the project would have taken at least a third longer; had we tried to use an RDB without any ORM, I think we would not have finished at all.

Things that suck: multiple inheritance After I had been using Moose for a couple of years, including the Moonpig project, Rik asked me what I thought of it. I was lukewarm. It introduces a lot of convenience for common operations, but also hides a lot of complexity under the hood, and the complexity does not always stay well-hidden. It is very big and very slow to start up. On the whole, I said, I could take it or leave it.

“Oh,” I added. “Except for Roles. Roles are awesome.” I had a long section in the talk about what is good about Roles, but I moved it out to a separate talk, so I am going to take that as a hint about what I should do here. As with my theory of dates and times, I will present only the thesis, and save the arguments for another post:

  1. Object-oriented programming is centered around objects, which are encapsulated groups of related data, and around methods, which are opaque functions for operating on particular kinds of objects.
  2. OOP does not mandate any particular theory of inheritance, either single or multiple, class-based or prototype based, etc., and indeed, while all OOP systems have objects and methods that are pretty much the same, each has an inheritance system all its own.
  3. Over the past 30 years of OOP, many theories of inheritance have been tried, and all of them have had serious problems.
  4. If there were no alternative to inheritance, we would have to struggle on with inheritance. However, Roles are a good alternative to inheritance:
    • Every problem solved by inheritance is solved at least as well by Roles.
    • Many problems not solved at all by inheritance are solved by Roles.
    • Many problems introduced by inheritance do not arise when using Roles.
    • Roles introduce some of their own problems, but none of them are as bad as the problems introduced by inheritance.
  5. It's time to give up on inheritance. It was worth a try; we tried it as hard as we could for thirty years or more. It didn't work.
  6. I'm going to repeat that: Inheritance doesn't work. It's time to give up on it.
Moonpig doesn't use any inheritance (except that Moonpig::DateTime inherits from DateTime, which we didn't control). Every class in Moonpig is composed from Roles. This wasn't because it was our policy to avoid inheritance. It's because Roles did everything we needed, usually in simple and straightforward ways.

I plan to write more extensively on this later on.

This section is the end of the things I want to excoriate. Note the transition from multiple inheritance, which was a tremendous waste of everyone's time, to Roles, which in my opinion are a tremendous success, the Right Thing, and gosh if only Smalltalk-80 had gotten this right in the first place look how much trouble we all would have saved.

Things that are GOOD: web RPC APIs Moonpig has a web API. Moonpig applications, such as the customer service dashboard, or the heartbeat job, invoke Moonpig functions through the API. The API is built using a system, developed in parallel with Moonpig, called Stick. (It was so-called because IC Group had tried before to develop a simple web API system, but none had been good enough to stick. This one, we hoped, would stick.)

The basic principle of Stick is distributed routing, which allows an object to have a URI, and to delegate control of the URIs underneath it to other objects.

To participate in the web API, an object must compose the Stick::Role::Routable role, which requires that it provide a _subroute method. The method is called with an array containing the path components of a URI. The _subroute method examines the array, or at least the first few elements, and decides whether it will handle the route. To refuse, it can throw an exception, or just return an undefined value, which will turn into a 404 error in the web protocol. If it does handle the path, it removes the part it handled from the array, and returns another object that will handle the rest, or, if there is nothing left, a public resource of some sort. In the former case the routing process continues, with the remaining route components passed to the _subroute method of the next object.

If the route is used up, the last object in the chain is checked to make sure it composes the Stick::Role::PublicResource role. This is to prevent accidentally exposing an object in the web API when it should be private. Stick then invokes one final method on the public resource, either resource_get, resource_post, or similar. Stick collects the return value from this method, serializes it and sends it over the network as the response.

So for example, suppose a ledger wants to provide access to its consumers. It might implement _subroute like this:

sub _subroute { my ($self, $route) = @_; if ($route->[0] eq "consumer") { shift @$route; my $consumer_id = shift @$route; return $self->find_consumer( id => $consumer_id ); } else { return; # 404 } } Then if /path/to/ledger is any URI that leads to a certain ledger, /path/to/ledger/consumer/12435 will be a valid URI for the specified ledger's consumer with ID 12345. A request to /path/to/ledger/FOOP/de/DOOP will yield a 404 error, as will a request to /path/to/ledger/consumer/98765 whenever find_consumer(id => 98765) returns undefined.

A common pattern is to have a path that invokes a method on the target object. For example, suppose the ledger objects are already addressable at certain URIs, and one would like to expose in the API the ability to tell a ledger to handle a heartbeat event. In Stick, this is incredibly easy to implement:

publish heartbeat => { -http_method => 'post' } => sub { my ($self) = @_; $self->handle_event( event('heartbeat') ); }; This creates an ordinary method, called heartbeat, which can be called in the usual way, but which is also invoked whenever an HTTP POST request arrives at the appropriate URI, the appropriate URI being anything of the form /path/to/ledger/heartbeat.

The default case for publish is that the method is expected to be GET; in this case one can omit mentioning it:

publish amount_due => sub { my ($self) = @_; … return abs($due - $avail); }; More complicated published methods may receive arguments; Stick takes care of deserializing them, and checking that their types are correct, before invoking the published method. This is the ledger's method for updating its contact information:

publish _replace_contact => { -path => 'contact', -http_method => 'put', attributes => HashRef, } => sub { my ($self, $arg) = @_; my $contact = class('Contact')->new($arg->{attributes}); $self->replace_contact($contact); return $contact; }; Although the method is named _replace_contact, is is available in the web API via a PUT request to /path/to/ledger/contact, rather than one to /path/to/ledger/_replace_contact. If the contact information supplied in the HTTP request data is accepted by class('Contact')->new, the ledger's contact is updated. (class('Contact') is a utility method that returns the name of the class that represents a contact. This is probably just the string Moonpig::Class::Contact.)

In some cases the ledger has an entire family of sub-objects. For example, a ledger may have many consumers. In this case it's also equipped with a "collection" object that manages the consumers. The ledger can use the collection object as a convenient way to look up its consumers when it needs them, but the collection object also provides routing: If the ledger gets a request for a route that begins /consumers, it strips off /consumers and returns its consumer collection object, which handles further paths such as /guid/XXXX and /xid/1234 by locating and returning the appropriate consumer.

The collection object is a repository for all sorts of convenient behavior. For example, if one composes the Stick::Role::Collection::Mutable role onto it, it gains support for POST requests to …/consumers/add, handled appropriately.

Adding a new API method to any object is trivial, just a matter of adding a new published method. Unpublished methods are not accessible through the web API.

After I wrote this talk I wished I had written a talk about Stick instead. I'm still hoping to write one and present it at YAPC in Orlando this summer.

Things that are GOOD: Object-oriented testing Unit tests often have a lot of repeated code, to set up test instances or run the same set of checks under several different conditions. Rik's Test::Routine makes a test program into a class. The class is instantiated, and the tests are methods that are run on the test object instance. Test methods can invoke one another. The test object's attributes are available to the test methods, so they're a good place to put test data. The object's initializer can set up the required test data. Tests can easily load and run other tests, all in the usual ways. If you like OO-style programming, you'll like all the same things about building tests with Test::Routine.

Things that are GOOD: Free software All this stuff is available for free under open licenses:

(This has been a really long article. Thanks for sticking with me. Headers in the article all have named anchors, in case you want to refer someone to a particular section.)

(I suppose there is a fair chance that this will wind up on Hacker News, and I know how much the kids at Hacker News love to dress up and play CEO and Scary Corporate Lawyer, and will enjoy posting dire tut-tuttings about whether my disclosure of ICG's secrets is actionable, and how reluctant they would be to hire anyone who tells such stories about his previous employers. So I may as well spoil their fun by mentioning that I received the approval of ICG's CEO before I posted this.)

[ Addendum: A detailed description of DateTime::Moonpig is now available. ]

Categories: Offsite Blogs

Mark Jason Dominus: DateTime::Moonpig, a saner interface to DateTime

Planet Haskell - Fri, 01/10/2014 - 10:30am

(This article was previously published at the Perl Advent Calendar on 2013-12-23.)

The DateTime suite is an impressive tour de force, but I hate its interface. The methods it provides are usually not the ones you want, and the things it makes easy are often things that are not useful.

Mutators

The most obvious example is that it has too many mutators. I believe that date-time values are a kind of number, and should be treated like numbers. In particular they should be immutable. Rik Signes has a hair-raising story about an accidental mutation that caused a hard to diagnose bug, because the add_duration method modifies the object on which it is called, instead of returning a new object.

DateTime::Duration

But the most severe example, the one that drives me into a rage, is that the subtract_datetime method returns a DateTime::Duration object, and this object is never what you want, because it is impossible to use it usefully.

For example, suppose you would like to know how much time elapses between 1969-04-02 02:38:17 EST and 2013-12-25 21:00:00 EST. You can set up the two DateTime objects for the time, and subtract them using the overloaded minus operator:

#!perl my ($a) = DateTime->new( year => 1969, month => 04, day => 02, hour => 2, minute => 38, second => 17, time_zone => "America/New_York" ) ; my ($b) = DateTime->new( year => 2013, month => 12, day => 25, hour => 21, minute => 0, second => 0, time_zone => "America/New_York" ) ; my $diff = $b - $a;

Internally this invokes subtract_datetime to yield a DateTime::Duration object for the difference. The DateTime::Duration object $diff will contain the information that this is a difference of 536 months, 23 days, 1101 minutes, and 43 seconds, a fact which seems to me to be of very limited usefulness.

You might want to know how long this interval is, so you can compare it to similar intervals. So you might want to know how many seconds this is. It happens that the two times are exactly 1,411,669,328 seconds apart, but there's no way to get the $diff object to tell you this.

It seems like there are methods that will get you the actual elapsed time in seconds, but none of them will do it. For example, $diff->in_units('seconds') looks promising, but will return 43, which is the 43 seconds left over after you've thrown away the 536 months, 23 days, and 1101 minutes. I don't know what the use case for this is supposed to be.

And indeed, no method can tell you how long the duration really is, because the subtraction has thrown away all the information about how long the days and months and years were—days, months and years vary in length—so it simply doesn't know how much time this object actually represents.

Similarly if you want to know how many days there are between the two dates, the DateTime::Duration object won't tell you because it can't tell you. If you had the elapsed seconds difference, you could convert it to the correct number of days simply by dividing by 86400 and rounding off. This works because, even though days vary in length, they don't vary by much, and the variations cancel out over the course of a year. If you do this you find that the elapsed number of days is approximately 16338.7653, which rounds off to 16338 or 16339 depending on how you want to treat the 18-hour time-of-day difference. This result is not quite exact, but the error is on the order of 0.000002%. So the elapsed seconds are useful, and you can compute other useful values with them, and get useful answers. In contrast, DateTime::Duration's answer of "536 months and 23 days" is completely useless because months vary in length by nearly 10% and DateTime has thrown away the information about how long the months were. The best you can do to guess the number of days from this is to multiply the 536 months by 30.4375, which is the average number of days in a month, and add 23. This is clumsy, and gets you 16337.5 days—which is close, but wrong.

To get what I consider a useful answer out of the DateTime objects you must not use the overloaded subtraction operator; instead you must do this:

#!perl $b->subtract_datetime_absolute($a)->in_units('seconds') What's DateTime::Moonpig for?

DateTime::Moonpig attempts to get rid of the part of DateTime I don't like and keep the part I do like, by changing the interface and leaving the internals alone. I developed it for the Moonpig billing system that Rik Signes and I did; hence the name.

DateTime::Moonpig introduces five main changes to the interface of DateTime:

  1. Most of the mutators are gone. They throw fatal exceptions if you try to call them.

  2. The overridden addition and subtraction operators have been changed to eliminate DateTime::Duration entirely. Subtracting two DateTime::Moonpig objects yields the difference in seconds, as an ordinary Perl number. This means that instead of

    #!perl $x = $b->subtract_datetime_absolute($a)->in_units('seconds')

    one can write

    #!perl $x = $b - $a

    From here it's easy to get the approximate number of days difference: just divide by 86400. Similarly, dividing this by 3600 gets the number of hours difference.

    An integer number of seconds can be added to or subtracted from a DateTime::Moonpig object; this yields a new object representing a time that is that many seconds later or earlier. Writing $date + 2 is much more convenient than writing $date->clone->add( seconds => 2 ).

    If you are not concerned with perfect exactness, you can write

    #!perl sub days { $_[0] * 86400 } my $tomorrow = $now + days(1);

    This might be off by an hour if there is an intervening DST change, or by a second if there is an intervening leap second, but in many cases one simply doesn't care.

    There is nothing wrong with the way DateTime overloads < and >, so DateTime::Moonpig leaves those alone.

  3. The constructor is extended to accept an epoch time such as is returned by Perl's built-in time() or stat() functions. This means that one can abbreviate this:

    #!perl DateTime->from_epoch( epoch => $epoch )

    to this:

    #!perl DateTime::Moonpig->new( $epoch )
  4. The default time zone has been changed from DateTime's "floating" time zone to UTC. I think the "floating" time zone is a mistake, and best avoided. It has bad interactions with set_time_zone, which DateTime::Moonpig does not disable, because it is not actually a mutator—unless you use the "floating" time zone. An earlier blog article discusses this.

  5. I added a few additional methods I found convenient. For example there is a $date->st that returns the date and time in the format YYYY-MM-DD HH:MM::SS, which is sometimes handy for quick debugging. (The st is for "string".)

Under the covers, it is all just DateTime objects, which seem to do what one needs. Other than the mutators, all the many DateTime methods work just the same; you are even free to use ->subtract_datetime to obtain a DateTime::Duration object if you enjoy being trapped in an absurdist theatre production.

When I first started this module, I thought it was likely to be a failed experiment. I expected that the Moonpig::DateTime objects would break once in a while, or that some operation on them would return a DateTime instead of a Moonpig::DateTime, which would cause some later method call to fail. But to my surprise, it worked well. It has been in regular use in Moonpig for several years.

I recently split it out of Moonpig, and released it to CPAN. I will be interested to find out if it works well in other contexts. I am worried that disabling the mutators has left a gap in functionality that needs to be filled by something else. I will be interested to hear reports from people who try.

DateTime::Moonpig on CPAN.

Categories: Offsite Blogs

ANN: xxHash, fast, high quality,non-cryptographic checksums.

haskell-cafe - Fri, 01/10/2014 - 9:22am
I've just finished implementing a haskell version of the xxHash algorithm, which is documented briefly here (mostly in code): https://code.google.com/p/xxhash/ The haskell version is *almost* as fast as C and allows the generation of a better quality checksum than adler32, in less time. Benchmarks are here: http://ponies.io/posts/2013-01-10-xxhash.html Code is here: https://github.com/christian-marie/xxhash Critique welcome! _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Categories: Offsite Discussion

Ian Ross: Haskell FFT 11: Optimisation Part 1

Planet Haskell - Fri, 01/10/2014 - 8:30am
Haskell FFT 11: Optimisation Part 1 January 10, 2014

Based on what we saw concerning the performance of our FFT code in the last article, we have a number of avenues of optimisation to explore. Now that we’ve got reasonable looking O(NlogN) scaling for all input sizes, we’re going to try to make the basic Danielson-Lanczos step part of the algorithm faster, since this will provide benefits for all input sizes. We can do this by looking at the performance for a single input length (we’ll use N=256). We’ll follow the usual approach of profiling to find parts of the code to concentrate on, modifying the code, then looking at benchmarks to see if we’ve made a positive difference.

Once we’ve got some way with this “normal” kind of optimisation, there are some algorithm-specific things we can do: we can include more hard-coded base transforms for one thing, but we can also try to determine empirically what the best decomposition of our input vector length is – for example, for N=256, we could decompose as 2×2×2×2×2×2×2×2, using length-2 base transforms and seven Danielson-Lanczos steps to form the final transform, or as 16×16, using a length-16 base transform and a single Danielson-Lanczos step to form the final result.

Basic optimisation

The first thing we need to do is set up some basic benchmarking code to run our N=256 test case, which we’re going to use to look at optimisations of the basic Danielson-Lanczos steps in our algorithm. The code to do this benchmarking is more or less identical to the earlier benchmarking code, but we’re also going to use this program:

module Main where import Criterion.Main import Data.Complex import Data.Vector import qualified Numeric.FFT as FFT tstvec :: Int -> Vector (Complex Double) tstvec sz = generate sz (\i -> let ii = fromIntegral i in sin (2*pi*ii/1024) + sin (2*pi*ii/511)) main :: IO () main = run (nf (FFT.fftWith $ FFT.plan 256) $ tstvec 256) 1000

This does nothing but run the N=256 FFT calculation 1000 times – we use the run and nf functions from the Criterion package to make sure our test function really does get invoked 1000 times. This allows us to get memory usage information without any of the overhead associated with benchmarking. We’ll also use this code for profiling.

The first issue we want to look at is allocation. By running our benchmarking program as ./profile-256 +RTS -s, we can get a report on total memory allocation and garbage collection statistics:

5,945,672,488 bytes allocated in the heap 7,590,713,336 bytes copied during GC 2,011,440 bytes maximum residency (2002 sample(s)) 97,272 bytes maximum slop 6 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 9232 colls, 0 par 2.94s 2.95s 0.0003s 0.0010s Gen 1 2002 colls, 0 par 1.98s 1.98s 0.0010s 0.0026s INIT time 0.00s ( 0.00s elapsed) MUT time 2.14s ( 2.14s elapsed) GC time 4.92s ( 4.92s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 7.07s ( 7.07s elapsed) %GC time 69.7% (69.7% elapsed) Alloc rate 2,774,120,352 bytes per MUT second Productivity 30.3% of total user, 30.3% of total elapsed

For 1000 N=256 FFTs, this seems like a lot of allocation. The ideal would be to allocate only the amount of space needed for the output vector. In this case, for a Vector (Complex Double) of length 256, this is 16,448 bytes, as reported by the recursiveSize function from the ghc-datasize package. So for 1000 samples, we’d hope to have only about 16,448,000 bytes of allocation – that’s actually pretty unrealistic since we’re almost certainly going to have to do some copying somewhere and there will be other overhead, but the numbers here give us a baseline to work from.

Profiling

To get some sort of idea where to focus our optimisation efforts, we need a bit more information, which we can obtain from building a profiling version of our library and test program. This can end up being a bit inconvenient because of the way that GHC and Cabal manage profiled libraries, since you need to have installed profiling versions of all the libraries in the transitive dependencies of your code in order to profile. The easiest way to deal with this issue is to use sandboxes. I use hsenv for this, but you could use the built-in sandboxes recently added to Cabal instead. Basically, you do something like this:

hsenv --name=profiling source .hsenv_profiling/bin/activate cabal install --only-dependencies -p --enable-executable-profiling

This will give you a hsenv sandbox with profiling versions of all dependent libraries installed.

To build our own library with profiling enabled, we add a ghc-prof-options line to the Cabal file, setting a couple of profiling options (prof-all and caf-all). Then, if we build our library with the -p option to Cabal, we’ll get a profiling version of the library built with the appropriate options as well as the “vanilla” library.

A second minor problem is that we really want to profile our library code, not just the code in the test program that we’re going to use. The simplest way to do this is to add the test program into our Cabal project. We add an Executable section for the test program and this then gets built when we build the library. This isn’t very pretty, but it saves some messing around.

Once we have all this set up, we can build a profiling version of the library and test program (in the sandbox with the profiling versions of the dependencies installed) by doing:

cabal install -p --enable-executable-profiling

and we can then run our profiling test program as:

./dist_profiling/build/profile-256/profile-256 +RTS -p

The result of this is a file called profile-256.prof that contains information about the amount of run time and memory allocation ascribed to different “cost centres” in our code (based on the profiling options we put in the Cabal file, there’s one cost centre for each top level function or constant definition, plus some others).

Aside from some header information, the contents of the profile file come in two parts – a kind of flat “Top Ten” of cost centres ordered by time spent in them, and a hierarchical call graph breakdown of runtime and allocation for each cost centre. For our profile-256 test program, the flat part of the profiling report looks like this:

COST CENTRE MODULE %time %alloc dl.doone.mult Numeric.FFT.Execute 26.2 23.8 dl.ds Numeric.FFT.Execute 21.6 21.6 dl.d Numeric.FFT.Execute 19.7 24.7 dl.doone.single Numeric.FFT.Execute 14.0 13.9 dl.doone Numeric.FFT.Execute 5.7 5.7 slicevecs Numeric.FFT.Utils 2.7 3.3 dl Numeric.FFT.Execute 2.1 1.9 special2.\ Numeric.FFT.Special 1.5 0.8 special2 Numeric.FFT.Special 1.4 2.4 slicevecs.\ Numeric.FFT.Utils 1.2 0.9 execute.multBase Numeric.FFT.Execute 1.1 0.6

and the first half or so of the hierarchical report like this:

individual inherited COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 235 0 0.3 0.0 100.0 100.0 fftWith Numeric.FFT 471 0 0.0 0.0 99.7 100.0 execute Numeric.FFT.Execute 473 1000 0.0 0.0 99.7 100.0 execute.sign Numeric.FFT.Execute 528 1000 0.0 0.0 0.0 0.0 execute.bsize Numeric.FFT.Execute 503 1000 0.0 0.0 0.0 0.0 baseSize Numeric.FFT.Types 504 1000 0.0 0.0 0.0 0.0 execute.fullfft Numeric.FFT.Execute 489 1000 0.4 0.1 99.7 99.9 execute.recomb Numeric.FFT.Execute 490 0 0.0 0.0 99.4 99.8 dl Numeric.FFT.Execute 514 7000 2.1 1.9 93.5 95.3 dl.ws Numeric.FFT.Execute 527 7000 0.1 0.0 0.1 0.0 dl.ns Numeric.FFT.Execute 522 7000 0.0 0.0 0.0 0.0 dl.doone Numeric.FFT.Execute 516 127000 5.7 5.7 90.8 92.6 dl.doone.vs Numeric.FFT.Execute 523 127000 0.8 0.2 3.6 2.9 slicevecs Numeric.FFT.Utils 525 127000 1.9 2.3 2.8 2.7 slicevecs.\ Numeric.FFT.Utils 526 254000 0.8 0.4 0.8 0.4 dl.doone.single Numeric.FFT.Execute 517 254000 14.0 13.9 81.5 84.0 dl.doone.mult Numeric.FFT.Execute 518 508000 26.2 23.8 67.5 70.1 dl.d Numeric.FFT.Execute 520 508000 19.7 24.7 41.3 46.3 dl.ds Numeric.FFT.Execute 521 0 21.6 21.6 21.6 21.6 dl.ds Numeric.FFT.Execute 519 508000 0.0 0.0 0.0 0.0 slicevecs Numeric.FFT.Utils 515 7000 0.4 0.6 0.5 0.8 slicevecs.\ Numeric.FFT.Utils 524 127000 0.1 0.2 0.1 0.2 execute.multBase Numeric.FFT.Execute 502 1000 1.1 0.6 5.8 4.5 applyBase Numeric.FFT.Execute 512 128000 0.6 0.1 4.1 3.2 special2 Numeric.FFT.Special 513 128000 1.4 2.4 3.5 3.1 special2.b Numeric.FFT.Special 536 128000 0.4 0.0 0.4 0.0 special2.a Numeric.FFT.Special 534 128000 0.3 0.0 0.3 0.0 special2.\ Numeric.FFT.Special 533 256000 1.5 0.8 1.5 0.8 slicevecs Numeric.FFT.Utils 511 1000 0.4 0.5 0.6 0.7 slicevecs.\ Numeric.FFT.Utils 535 128000 0.3 0.2 0.3 0.2 execute.recomb Numeric.FFT.Execute 488 1000 0.0 0.0 0.0 0.0 execute.rescale Numeric.FFT.Execute 476 1000 0.0 0.0 0.0 0.0 execute.(...) Numeric.FFT.Execute 475 1000 0.0 0.0 0.0 0.0 execute.n Numeric.FFT.Execute 474 1000 0.0 0.0 0.0 0.0

It’s clear from this that the vast majority of the allocation (89.7%) and time (87.2%) is spent in the Danielson-Lanczos step function dl in Numeric.FFT.Execute. Here’s the code for this function from version pre-release-1 of the repository:

-- | Single Danielson-Lanczos step: process all duplicates and -- concatenate into a single vector. dl :: WMap -> Int -> (Int, Int) -> VCD -> VCD dl wmap sign (wfac, split) h = concatMap doone $ slicevecs wfac h where -- Size of each diagonal sub-matrix. ns = wfac `div` split -- Roots of unity for the size of diagonal matrix we need. ws = wmap IM.! (sign * wfac) -- Basic diagonal entries for a given column. ds c = map ((ws !) . (`mod` wfac) . (c *)) $ enumFromN 0 ns -- Twiddled diagonal entries in row r, column c (both -- zero-indexed), where each row and column if a wfac x wfac -- matrix. d r c = map ((ws ! ((ns * r * c) `mod` wfac)) *) (ds c) -- Process one duplicate by processing all rows and concatenating -- the results into a single vector. doone v = concatMap single $ enumFromN 0 split where vs :: VVCD vs = slicevecs ns v -- Multiply a single block by its appropriate diagonal -- elements. mult :: Int -> Int -> VCD mult r c = zipWith (*) (d r c) (vs!c) -- Multiply all blocks by the corresponding diagonal -- elements in a single row. single :: Int -> VCD single r = foldl1' (zipWith (+)) $ map (mult r) $ enumFromN 0 split

In particular, the mult, ds, d and single local functions defined within dl take most of the time, and are responsible for a most of the allocation. It’s pretty clear what’s happening here: although the vector package has rewrite rules to perform fusion to eliminate many intermediate vectors in chains of calls to vector transformation functions, we’re still ending up allocating a lot of temporary intermediate values. Starting from the top of the dl function:

  • We generate a list of slices of the input vector by calling the slicevecs utility function – this doesn’t do much allocation and doesn’t take much time because slices of vectors are handled just by recording an offset into the immutable array defining the vector to be sliced.

  • We do a concatMap of the local doone function across this list of slices – besides any allocation that doone does, this will do more allocation for the final output vector. Moreover, concatMap cannot be fused.

  • The doone function also calls concatMap (so again preventing fusion), evaluating the local function single once for each of the sub-vectors to be processed here.

  • Finally, both single and the function mult that it calls allocate new vectors based on combinations of input vector elements and powers of the appropriate ωN.

So, what can we do about this? There’s no reason why we need to perform all this allocation. It seems as though it ought to be possible to either perform the Danielson-Lanczos steps in place on a single mutable vector or (more simply) to use two mutable vectors in a “ping-pong” fashion. We’ll start with the latter approach, since it means we don’t need to worry about the exact order in which the Danielson-Lanczos steps access input vector elements. If we do things this way, we should be able to get away with just two vector allocations, one for the initial permutation of the input vector elements, and one for the other “working” vector. Once we’ve finished the composition of the Danielson-Lanczos steps, we can freeze the final result as a return value (if we use unsafeFreeze, this is an O(1) operation). Another thing we can do is to use unboxed vectors, which both reduces the amount of allocation needed and speeds up access to vector elements. Below, I’ll describe a little how mutable and unboxed vectors work, then I’ll show the changes to our FFT algorithm needed to exploit these things. The code changes I’m going to describe here are included in the pre-release-2 version of the arb-fft package on GitHub.

Mutable vectors

A “normal” Vector, as implemented by the Data.Vector class, is immutable. This means that you allocate it once, setting its values somehow, and thereafter it always has the same value. Calculations on immmutable vectors are done by functions with types that are something like ... -> Vector a -> Vector a that create new vectors from old, allocating new storage each time. This sounds like it would be horribly inefficient, but the vector package uses GHC rewrite rules to implement vector fusion. This is a mechanism that allows intermediate vectors generated by chains of calls to vector transformation functions to be elided and for very efficient code to be produced.

In our case, the pattern of access to the input vectors to the execute and dl functions is not simple, and it’s hard to see how we might structure things so that creation of intermediate vectors could be avoided. Instead, we can fall back on mutable vectors. A mutable vector, as defined in Data.Vector.Mutable, is a linear collection of elements providing a much reduced API compared to Data.Vector’s immutable vectors: you can create and initialise mutable vectors, read and write individual values, and that’s about it. There are functions to convert between immutable and mutable vectors (thaw turns an immutable vector into a mutable vector, and freeze goes the other way).

Now, the bad news. As soon as we introduce mutability, our code is going to become less readable and harder to reason about. This is pretty much unavoidable, since we’re going to be explicitly controlling the order of evaluation to control the sequence of mutations we perform on our working vectors. The vector package provides a clean monadic interface for doing this, but it’s still going to be messier than pure functional code. This is something that often happens when we come to optimise Haskell code: in order to control allocation in the guts of an algorithm, we quite often need to switch over to writing mutating code that’s harder to understand1. However, this is often less of a problem than you might think, because you almost never sit down to write that mutating code from nothing. It’s almost invariably a good idea to write pure code to start with, code that’s easier to understand and easier to reason about. Then you profile, figure out where in the code the slowdowns are, and attack those. You can use your pure code both as a template to guide the development of the mutating code, and as a comparison for testing the mutating code.

I worked this way for all of the changes I’m going to describe in this section: start from a known working code (tested using the QuickCheck tests I described above), make some changes, get the changes to compile, and retest. Writing code with mutation means that the compiler has fewer opportunities to protect you from yourself: the (slightly silly) Haskell adage that “if it compiles, it works” doesn’t apply so much here, so it’s important to test as you go.

Since we have to sequence modifications to mutable vectors explicitly, we need a monadic interface to control this sequencing. The vector package provides two options – you can either do everything in the IO monad, or you can use an ST monad instance. We’ll take the second option, since this allows us to write functions that still have pure interfaces – the ST monad encapsulates the mutation and doesn’t allow any of that impurity to escape into the surrounding code. If you use the IO monad, of course, all bets are off and you can do whatever you like, which makes things even harder to reason about.

Given these imports and type synonyms:

import Data.Vector import qualified Data.Vector.Mutable as MV import Data.Complex type VCD = Vector (Complex Double) type MVCD s = MV.MVector s (Complex Double) type VVCD = Vector VCD type VVVCD = Vector VVCD

a version of the dl function using mutable vectors will have a type signature something like

dl :: WMap -> Int -> (Int, Int, VVVCD, VVVCD) -> MVCD s -> MVCD s -> ST s () dl wmap sign (wfac, split, dmatp, dmatm) mhin mhout = ...

Here, the first two arguments are the collected powers of ωN and the direction of the transform and the four-element tuple gives the sizing information for the Danielson-Lanczos step and the entries in the diagonal matrices in the “I+D” matrix2. The interesting types are those of mhin, mhout and the return type. Both mhin and mhout have type MVCD s, or MV.Vector s (Complex Double) showing them to be mutable vectors of complex values. The return type of dl is ST s (), showing that it is a monadic computation in the ST monad that doesn’t return a value. The type variable s that appears in both the types for mhin, mhout and the return type is a kind of tag that prevents internal state from leaking from instances of the ST monad. This type variable is never instantiated, but it serves to distinguish different invocations of runST (which is used to evaluate ST s a computations). This is the mechanism that allows the ST monad to cleanly encapsulate stateful computation while maintaining a pure interface.

When I made the changes need to use mutable vectors in the FFT algorithm, I started by just trying to rewrite the dl function to use mutable vectors. This allowed me to get some of the types right, to figure out that I needed to be able to slice up mutable vectors (the slicemvecs function) and to get some sort of feeling for how the execute driver function would need to interface with dl if everything was rewritten to use mutable vectors from the top. Once a mutable vector-based version of dl was working, I moved the allocation of vectors into execute function and set things up to bounce back and forth between just two vector buffers.

Here’s part of the new dl function to give an impression of what code using mutable vectors looks like (if you’re really interested in how this works, I’d recommend browsing through the pre-release-2 version of the code on GitHub, although what you’ll see there is a little different to the examples I’m showing here as it’s a bit further along in the optimisation process):

1 2 3 4 5 6 7 8 9 10 -- Multiply a single block by its appropriate diagonal -- elements and accumulate the result. mult :: VMVCD s -> MVCD s -> Int -> Bool -> Int -> ST s () mult vins vo r first c = do let vi = vins V.! c dvals = d r c forM_ (enumFromN 0 ns) $ \i -> do xi <- MV.read vi i xo <- if first then return 0 else MV.read vo i MV.write vo i (xo + xi * dvals ! i)

This code performs a single multiplication of blocks of the current input vector by the appropriate diagonal matrix elements in the Danielson-Lanczos I+D matrix. The return type of this function is ST s (), i.e. a computation in the ST monad tagged by a type s with no return value. Take a look at lines 7-10 in this listing – these are the lines that do the actual computation. We loop over the number of entries we need to process (using the call to forM_ in line 7). Values are read from an input vector vi (line 8) and accumulated into the output vector vo (lines 9 and 10) using the read and write functions from Data.Vector.Mutable. This looks (apart from syntax) exactly like code you might write in C to perform a similar task!

We certainly wouldn’t want to write code like this all the time, but here, since we’re using it down in the depths of our algorithm and we have clean pure code that we’ve already written that we can test it against if we need to, it’s not such a problem.

Here is the most important part of the new execute function:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 -- Apply Danielson-Lanczos steps and base transform to digit -- reversal ordered input vector. fullfft = runST $ do mhin <- thaw $ case perm of Nothing -> h Just p -> backpermute h p mhtmp <- MV.replicate n 0 multBase mhin mhtmp mhr <- newSTRef (mhtmp, mhin) V.forM_ dlinfo $ \dlstep -> do (mh0, mh1) <- readSTRef mhr dl wmap sign dlstep mh0 mh1 writeSTRef mhr (mh1, mh0) mhs <- readSTRef mhr unsafeFreeze $ fst mhs -- Multiple base transform application for "bottom" of algorithm. multBase :: MVCD s -> MVCD s -> ST s () multBase xmin xmout = V.zipWithM_ (applyBase wmap base sign) (slicemvecs bsize xmin) (slicemvecs bsize xmout)

These illustrate a couple more features of working with mutable vectors in the ST monad. First, in the multBase function (lines 18-21), which applies the “base transform” at the “bottom” of the FFT algorithm to subvectors of the permuted input vector, we see how we can use monadic versions of common list and vector manipulation functions (here zipWithM_, a monadic, void return value version of zipWith) to compose simpler functions. The code here splits each of the xmin and xmout mutable vectors into subvectors (using the slicemvecs function in Numeric.FFT.Utils), then uses a partial application of applyBase to zip the vectors of input and output subvectors together. The result is that applyBase is applied to each of the input and output subvector pairs in turn.

The calculation of fullfft (lines 3-15) demonstrates a couple more techniques:

  • We see (line 3) how the overall ST monad computation is evaluated by calling runST: fullfft is a pure value, and the ST monad allows us to strictly bound the part of our code where order-dependent stateful computations occur.

  • We see (lines 4-6) how the thaw function is used to convert an immutable input vector h into a mutable vector mhin for further processing.

  • The final result of the computation is converted back to an immutable vector using the unsafeFreeze function (line 15). Using unsafeFreeze means that we promise that no further access will be made to the mutable version of our vector – this means that the already allocated storage for the mutable vector can be reused for its immutable counterpart, making unsafeFreeze an O(1) operation. Here, we can see immediately that no further use is made of the mutable version of the vector since the result of the call to unsafeFreeze is returned as the overall result of the monadic computation we are in.

  • The “ping pong” between two mutable vectors used to evaluate the Danielson-Lanczos steps is handled by using a mutable reference (created by a call to newSTRef at line 9). At each Danielson-Lanczos step, we read this references (line 11), call dl (line 12), passing one of the vectors as input (penultimate argument) and one as output (last argument), then write the pair of vectors back to the mutable reference in the opposite order (line 13), achieving the desired alternation between vector buffers. After each Danielson-Lanczos step, the result so far is held in the first vector of the pair, so this is what’s extracted for the final return value in line 15.

We apply similar speedups to the Rader’s algorithm code for prime-length FFTs, although I won’t show that here since it’s not all that interesting. (We also fold the index permutation involved in the first step of Rader’s algorithm into the main input vector index permutation, since there’s little point in permuting the input then permuting it again!) In fact, a more important issue for speeding up Rader’s algorithm is to get transforms of lengths that are powers of two as fast as possible (recall that we pad the input to a power of two length to make the convolution in the Rader transform efficient). We’ll address this more directly in the next section.

Unboxing

Until now, all the vectors we’ve been dealing with have been boxed. This means that there’s an extra level of indirection in each element in the vector allowing values of composite data types to be stored. However, all of our vectors are just plain Vector (Complex Double), so we might hope to be able to get away without that extra level of indirection.

In fact, apart from one small issue, getting this to work is very easy: just replace imports of Data.Vector with Data.Vector.Unboxed (and the equivalent for mutable vectors). This simplicity is the result of some very clever stuff: Data.Vector.Unboxed is one of those pieces of the Haskell ecosystem that make you sit back and say “Wow!”. There’s some very smart type system stuff going on that allows the unboxed vector implementation to select an efficient specialised representation for every different element type. For our purposes, in particular, although Complex Double is a composite type (and so not, on the face of it, a good candidate for unboxing), Data.Vector.Unboxed has an instance of its Unbox type class for Complex a that leads to efficient memory layout and access of vectors of complex values. It’s really very neat, and is one of those things where the Haskell type system (and the implementor’s clever use of it!) allows you to write code at a high level while still maintaining an efficient low-level representation internally.

The “small issue” mentioned above is just that we sometimes want to have Vectors of Vectors, and a Vector is not itself unboxable (because you can’t tell how much memory it occupies in advance without knowing the length of the vector). This just means that you need to use the “normal” boxed vectors in this case, by importing Data.Vector qualified, as shown here (compare with the other listing of type synonyms above, which is without unboxing):

import Data.Vector.Unboxed import qualified Data.Vector as V import qualified Data.Vector.Unboxed.Mutable as MV import Data.Complex type VCD = Vector (Complex Double) type MVCD s = MV.MVector s (Complex Double) type VVCD = V.Vector VCD type VVVCD = V.Vector VVCD type VMVCD a = V.Vector (MVCD a) type VI = Vector Int More hard-coded base transforms for prime lengths

Although they’re very tedious to write, the specialised transforms for small prime input lengths make a big difference to performance. I’ve converted “codelets” for all prime lengths up to 13 from the FFTW code.

The more I think about it, the more attractive is the idea of writing something comparable to FFTW’s genfft in order to generate these “straight line” transforms programmatically. There are a number of reasons for this:

  1. Copying the FFTW codelets by hand is not a very classy thing to do. It’s also very tedious involving boring editing (Emacs keyboard macros help, but it’s still dull).

  2. If we have a programmatic way of generating straight line codelets, we can use it for other input length sizes. Most of FFTW is implemented in C, which isn’t an ideal language for the kind of algebraic metaprogramming that’s needed for this task, so the FFTW folks wrote genfft in OCaml. In Haskell, we can do metaprogramming with Template Haskell, i.e. in Haskell itself, which is much nicer and means that, if we know our input length at compile time, we could provide a Template Haskell function to generate the optimum FFT decomposition with any necessary straight line code at the bottom generated automatically, allowing us to handle awkward prime input length factors efficiently (within reason, of course – if the straight line code gets too big, it won’t fit in the machine’s cache any more and a lot of the benefit of doing this would be lost).

  3. We’ve only been specialising the primitive transforms at the bottom of the FFT decomposition. Let’s call those specialised bits of straight line code “bottomlets”. It’s also possible to write specialised “twiddlets” to do the Danielson-Lanczos I+D matrix multiplication – there are often algebraic optimisations that could be done here to make things quicker. If we had a Template Haskell genfft equivalent, we could extend it to generate these “twiddlets” as well as the “bottomlets”, giving us the capability to produce optimal code for all steps in the FFT decomposition.

This seems like it would be an interesting task, although probably quite a big piece of work. I’m going to leave it to one side for the moment, but will probably have a go at it at some point in the future.

Strictness/laziness

One topic we’ve not looked at which usually turns up in discussions of Haskell optimisation is strictness (or, if you prefer, laziness). We can tell just from thinking about the basic DFT algorithm that any FFT computation is strict in its input vector, and strict in all of the elements of that input vector. There shouldn’t be any laziness going on anywhere once we get into executing the FFT. (It matters less at the planning stage, since we’ll only do that once.)

To some extent, the behaviour of the Vector types helps us here. The data structures used to implement the various flavours of Vector are all strict in the arrays that are used to implement them, and since we’re now using unboxed vectors, the arrays inside the vectors are necessarily strict in their contents.

I’m actually going to sidestep this issue for a little while, since we’ll be coming back to this kind of question when we do a final “kicking the tyres” examination of the whole FFT code after we’ve done the empirical optimisation described in the next section.

Profiling and benchmarking check

Let’s see where we’ve got to in terms of performance. Here’s a view of the performance of the current optimised code (which is tagged pre-release-2 in the GitHub repository) in the same format as we’ve used previously:

<plot-data format="csv" name="dat" separator=" " src="/blog/posts/2014/01/10/data-analysis-fft-11/benchmark-3.dat"> </plot-data>

<plot aspect="1.6" axis-x-end-tick-size="0" axis-x-label="Input length" axis-x-ticks="[[[[8,'8'],[16,'16'],[32,'32'],[64,'64'],[128,'128'],[256,'256'],[512,'512'],[1024,'1024']]]]" axis-x-transform="log" axis-y-label="Time" axis-y-ticks="[[[[1,'1 μs'],[10,'10 μs'],[100,'100 μs'],[1000,'1 ms']]]]" axis-y-transform="log" font-size="16" oxs="[[seqStep(8,1024,4)]]" range-x="8,1100" ui-axis-x-transform="ui-axis-x-transform" ui-axis-y-transform="ui-axis-y-transform" width="800"> <plot-options stroke="black" stroke-width="1.5"> <lines label="O(N log N)" x="[[oxs]]" y="[[x*log(x)]]"></lines> <lines x="[[oxs]]" y="[[0.01*x*log(x)]]"></lines> </plot-options> <plot-options stroke-width="2"> <lines label="FFT" stroke="green" x="[[dat.Size]]" y="[[dat.FFT]]"></lines> <lines label="FFTW" stroke="blue" x="[[dat.Size]]" y="[[dat.FFTW]]"></lines> <plot-options> </plot>

The overall performance and scaling behaviour of our code isn’t all that different to what we saw before (last article), but everything is faster! We can get some idea of how much faster from the plot stack below. Each of the three panels in this plot is a histogram of execution time ratios for all problem sizes in the range 8≤N≤1024. The tab labelled “pre-release-1” shows the execution time ratios between the pre-release-1 version of our code and FFTW – larger values represent slower execution relative to FFTW. The tab labelled “pre-release-2” shows the same ratios for the pre-release-2 version of our code – much better! – and finally, the tab labelled “Speed-up” shows execution time ratios between the pre-release-1 and pre-release-2 versions of our code, which gives a measure of the speed-up we’ve achieved with our optimisations.

<plot-data format="csv" name="ratios" src="/blog/posts/2014/01/10/data-analysis-fft-11/ratios.dat"> </plot-data>

<plot-stack aspect="1.6" width="800"> <plot axis-x-label="FFT/FFTW execution time ratio" axis-y-label="Frequency" bar-width="-1px" fill="blue" fill-opacity="0.3" font-size="16" stroke="none" title="pre-release-1"> <bars hist="[[histogram(ratios.rel2, 50)]]" x="[[hist.centres]]" y="[[hist.counts]]"></bars> </plot> <plot axis-x-label="FFT/FFTW execution time ratio" axis-y-label="Frequency" bar-width="-1px" fill="blue" fill-opacity="0.3" font-size="16" stroke="none" title="pre-release-2"> <bars hist="[[histogram(ratios.rel3, 50)]]" x="[[hist.centres]]" y="[[hist.counts]]"></bars> </plot> <plot axis-x-label="Execution time speed-up" axis-y-label="Frequency" bar-width="-1px" fill="blue" fill-opacity="0.3" font-size="16" stroke="none" title="Speed-up"> <bars hist="[[histogram(ratios.speedup, 50)]]" x="[[hist.centres]]" y="[[hist.counts]]"></bars> </plot> </plot-stack>

Concentrating on the “Speed-up” tab showing the speed-up between the unoptimised and optimised versions of our code, we can see that for most input vector lengths the optimised code is around 10-20 times faster. For some input vector lengths though, we get speed-up factors of 50-100 times compared to the unoptimised code. This is definitely good news. A lot of this improvement in performance comes from using specialised straight line code for small prime length transforms. In the next article, we’ll think about how to exploit this kind of specialised code for other input sizes (especially powers of two, which will also give us a speed-up for the convolution in Rader’s algorithm).

Looking at the “pre-release-2” tab, we appear to be getting to within a factor of 10 of the performance of FFTW for some input vector lengths, which is really not bad, considering the limited amount of optimisation that we’ve done, and my own relative lack of experience with optimising this kind of Haskell code!

  1. Note that this is advice aimed at mortals. There are some people – I’m not one of them – who are skilled enough Haskell programmers that they can rewrite this sort of code to be more efficient without needing to drop into the kind of stateful computation using mutation that we’re going to use. For the moment, let’s mutate!

  2. I moved the calculation of these out into the FFT planning stage on the basis of profiling information I collected, after I’d switched over to using mutable vectors, that indicated that a lot of time was being spent recalculating these diagonal matrix entries on each call to dl.

data-analysis haskell <script src="http://zor.livefyre.com/wjs/v3.0/javascripts/livefyre.js" type="text/javascript"></script> <script type="text/javascript"> (function () { var articleId = fyre.conv.load.makeArticleId(null); fyre.conv.load({}, [{ el: 'livefyre-comments', network: "livefyre.com", siteId: "290329", articleId: articleId, signed: false, collectionMeta: { articleId: articleId, url: fyre.conv.load.makeCollectionUrl(), } }], function() {}); }()); </script>
Categories: Offsite Blogs

Dominic Steinitz: Getting Financial Data in R

Planet Haskell - Fri, 01/10/2014 - 2:08am

I have recently started providing consultancy to a hedge fund and as far as I can see, R looks like it has a good set of libraries for this domain. In my previous job I used an embedded domain specific language in Haskell (Frankau et al. 2009). I’d like to be able to use Haskell again but my impression is that publicly available libraries either do not exist or are not particularly mature although I would love to be proved wrong.

I’ve used knitr to produce this post rather than my usual BlogLiteratelyD.

For example, let us plot an index.

First we load the quantmod library

library(quantmod)

We can chart the S&P 500 for 2013.

GSPC <- getSymbols("^GSPC", src = "yahoo", auto.assign = FALSE) dim(GSPC) ## [1] 1768 6 head(GSPC, 4) ## GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume ## 2007-01-03 1418 1429 1408 1417 3.429e+09 ## 2007-01-04 1417 1422 1408 1418 3.004e+09 ## 2007-01-05 1418 1418 1406 1410 2.919e+09 ## 2007-01-08 1409 1415 1404 1413 2.763e+09 ## GSPC.Adjusted ## 2007-01-03 1417 ## 2007-01-04 1418 ## 2007-01-05 1410 ## 2007-01-08 1413 tail(GSPC, 4) ## GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume ## 2014-01-06 1832 1837 1824 1827 3.295e+09 ## 2014-01-07 1829 1840 1829 1838 3.512e+09 ## 2014-01-08 1838 1840 1831 1837 3.652e+09 ## 2014-01-09 1839 1843 1830 1838 3.581e+09 ## GSPC.Adjusted ## 2014-01-06 1827 ## 2014-01-07 1838 ## 2014-01-08 1837 ## 2014-01-09 1838 chartSeries(GSPC, subset = "2013", theme = "white")

S&P 500 for 2013

We can also chart currencies e.g. the Rupee / US Dollar exchange rate.

INRUSD <- getSymbols("INR=X", src = "yahoo", auto.assign = FALSE) dim(INRUSD) ## [1] 1805 6 head(INRUSD, 4) ## INR=X.Open INR=X.High INR=X.Low INR=X.Close INR=X.Volume ## 2007-01-01 44.22 44.22 44.04 44.22 0 ## 2007-01-02 44.21 44.22 44.08 44.12 0 ## 2007-01-03 44.12 44.41 44.09 44.11 0 ## 2007-01-04 44.12 44.48 44.10 44.10 0 ## INR=X.Adjusted ## 2007-01-01 44.22 ## 2007-01-02 44.12 ## 2007-01-03 44.11 ## 2007-01-04 44.10 tail(INRUSD, 4) ## INR=X.Open INR=X.High INR=X.Low INR=X.Close INR=X.Volume ## 2014-01-01 61.84 61.97 61.80 61.80 0 ## 2014-01-02 61.84 62.41 61.74 61.84 0 ## 2014-01-03 62.06 62.57 62.06 62.06 0 ## 2014-01-06 62.23 62.45 61.94 62.23 0 ## INR=X.Adjusted ## 2014-01-01 61.80 ## 2014-01-02 61.84 ## 2014-01-03 62.06 ## 2014-01-06 62.23 chartSeries(INRUSD, subset = "2013", theme = "white")

Rupee / US Dollar exchange rate for 2013

Bibliography

Frankau, Simon, Diomidis Spinellis, Nick Nassuphis, and Christoph Burgard. 2009. “Commercial Uses: Going Functional on Exotic Trades.” J. Funct. Program. 19 (1) (jan): 27–45. doi:10.1017/S0956796808007016. http://dx.doi.org/10.1017/S0956796808007016.


Categories: Offsite Blogs