An Exersize in Over-Engineering

Submitted by TomMD on Wed, 02/20/2008 - 9:44pm.

Intro

Inspired by an old article on over engineering I decided to try and over design / bloat an event system (aka a timeout system). This is the same system I posted about earlier, but with fewer features and many more LOC.

Conception

The initial concept was clean, simple. It would handle hundreds of events, be useful for 80% of the cases (I do like the 80% rule). My event system state was this:

data EventState = EvtState { esEvents :: [Event], esAlarm :: ClockTime, esNextId :: EventId }

Each new event gets a new Id number, the 'NextId' is incremented, and the event is added to the time ordered list, 'esEvents'. 'esAlarm' is updated as necessary (atomically by the thread adding the event with an earlier expire time).

Functionally, there are 2 threads in the system and 'n' threads of event producers. One thread, eventExpire, would threadDelay until the soonest event needs to be expired. The other thread, watchAlarm, would keep an eye on esAlarm - if it was changed to an earlier time then an exception would be thrown to the eventExpire thread.

Theoretical Bugs!

Oh no! You could theoretically overflow the Int used to express EventId. I could use Integer, but that still has a theoretical limit and the memory used for ids would grow (eventually). So, I set off to redefine EventId from an Int to a (ClockTime,Int) combo. How do I know this doesn't overflow? Because my new state field which keeps track of the next event Id for each ClockTime (less the expired times), and if it does overflow for a given clock time I refuse to add the event.

Realistic Performance Issues

That's right! A close runner up for theoretical bugs is fixing performance. Why is this slow for large numbers of events? Well, adding events is O(n), expiring events can are in order in the list, and updating the alarm time is O(1).

Lets turn _everything_ into a mapping. This way most operations of interest are ~O(ln(n)) The new state will look like this:

type EventSet = (EventNumber, Map EventNumber (IO ())) data EventSystem = EvtSys { esEvents :: TVar (Map ClockTime EventSet), -- Pending Events esThread :: TVar (Maybe ThreadId), -- Thread for TimerReset exceptions esAlarm :: TVar ClockTime, -- Time of soonest event esNewAlarm :: TVar Bool, -- new/earlier alarm }

What is this in English? esEvents is a mapping of ClockTimes to a mapping of EventNumbers to IO Actions (and the next unique EventNumber is in there). Also, there is now esNewAlarm, so the alarmWatch can avoid throwing expires unnecessarily.

All in all this does OK, 3 seconds to add and execute 100,000 actions (vs a previous 38 seconds for 10,000).

Fixing Dropped Events

Now we have eliminated theoretical bugs and improved performance - whats left? Fixing real bugs of coarse! When running the test code (shamelessly stolen from control-timeout - thanks Adam!) I found some tests never finished and no more events were pending. The cause seems to be that between the event list being pruned of old events and those events being executed the thread was getting an exception (a new event was added with an earlier alarm). The solution is yet another thread and TVar (that's right, I've an irrational dislike of 'block'). Old events are atomically pruned and written to an 'expired' list - then the third thread, monitorExpiredQueue, can zero the queue and execute the actions. My new state has the one more TVar:

esExpired :: TVar [[EventSet]]

A Useful API

So now that I've finished all this, and we are two or three days after my original post, I need to make the API something useful. Current code looks like:

sys <- initEventSystem (TOD sec ps) <- getClockTime eid <- addEvent sys (TOD (sec+delay) ps) evt ... if dontNukeThem then cancelEvent sys eid else return () -- explosives concept borrowed from recent -cafe example

Adam had a better idea for most use cases in that they work based on deltas. So I wrote a shim to provide the Control.Timeout API with the Control.Event engine (Control.Event.Timeout).

eid <- addTimeout act delay if dontNukeThem then cancelTimeout eid else return ()
And an informal performance comparison is:
   Control.Event can add 100K events in 3  seconds
   Control.Timeout       100K           12 seconds
   Control.Event.Timeout 100K           58 seconds

Control.Event.Timeout calls getClockTime on every event add, so driving the Control.Event code with time deltas is really slow. Control.Timeout is obviously very good with time deltas.