Libraries

An Exersize in Over-Engineering

Submitted by TomMD on Wed, 02/20/2008 - 9:44pm.

Intro

Inspired by an old article on over engineering I decided to try and over design / bloat an event system (aka a timeout system). This is the same system I posted about earlier, but with fewer features and many more LOC.

Conception

The initial concept was clean, simple. It would handle hundreds of events, be useful for 80% of the cases (I do like the 80% rule). My event system state was this:

data EventState = EvtState { esEvents :: [Event], esAlarm :: ClockTime, esNextId :: EventId }

Each new event gets a new Id number, the 'NextId' is incremented, and the event is added to the time ordered list, 'esEvents'. 'esAlarm' is updated as necessary (atomically by the thread adding the event with an earlier expire time).

Functionally, there are 2 threads in the system and 'n' threads of event producers. One thread, eventExpire, would threadDelay until the soonest event needs to be expired. The other thread, watchAlarm, would keep an eye on esAlarm - if it was changed to an earlier time then an exception would be thrown to the eventExpire thread.

Theoretical Bugs!

Oh no! You could theoretically overflow the Int used to express EventId. I could use Integer, but that still has a theoretical limit and the memory used for ids would grow (eventually). So, I set off to redefine EventId from an Int to a (ClockTime,Int) combo. How do I know this doesn't overflow? Because my new state field which keeps track of the next event Id for each ClockTime (less the expired times), and if it does overflow for a given clock time I refuse to add the event.

Realistic Performance Issues

That's right! A close runner up for theoretical bugs is fixing performance. Why is this slow for large numbers of events? Well, adding events is O(n), expiring events can are in order in the list, and updating the alarm time is O(1).

Lets turn _everything_ into a mapping. This way most operations of interest are ~O(ln(n)) The new state will look like this:

type EventSet = (EventNumber, Map EventNumber (IO ())) data EventSystem = EvtSys { esEvents :: TVar (Map ClockTime EventSet), -- Pending Events esThread :: TVar (Maybe ThreadId), -- Thread for TimerReset exceptions esAlarm :: TVar ClockTime, -- Time of soonest event esNewAlarm :: TVar Bool, -- new/earlier alarm }

What is this in English? esEvents is a mapping of ClockTimes to a mapping of EventNumbers to IO Actions (and the next unique EventNumber is in there). Also, there is now esNewAlarm, so the alarmWatch can avoid throwing expires unnecessarily.

All in all this does OK, 3 seconds to add and execute 100,000 actions (vs a previous 38 seconds for 10,000).

Fixing Dropped Events

Now we have eliminated theoretical bugs and improved performance - whats left? Fixing real bugs of coarse! When running the test code (shamelessly stolen from control-timeout - thanks Adam!) I found some tests never finished and no more events were pending. The cause seems to be that between the event list being pruned of old events and those events being executed the thread was getting an exception (a new event was added with an earlier alarm). The solution is yet another thread and TVar (that's right, I've an irrational dislike of 'block'). Old events are atomically pruned and written to an 'expired' list - then the third thread, monitorExpiredQueue, can zero the queue and execute the actions. My new state has the one more TVar:

esExpired :: TVar [[EventSet]]

A Useful API

So now that I've finished all this, and we are two or three days after my original post, I need to make the API something useful. Current code looks like:

sys <- initEventSystem (TOD sec ps) <- getClockTime eid <- addEvent sys (TOD (sec+delay) ps) evt ... if dontNukeThem then cancelEvent sys eid else return () -- explosives concept borrowed from recent -cafe example

Adam had a better idea for most use cases in that they work based on deltas. So I wrote a shim to provide the Control.Timeout API with the Control.Event engine (Control.Event.Timeout).

eid <- addTimeout act delay if dontNukeThem then cancelTimeout eid else return ()
And an informal performance comparison is:
   Control.Event can add 100K events in 3  seconds
   Control.Timeout       100K           12 seconds
   Control.Event.Timeout 100K           58 seconds

Control.Event.Timeout calls getClockTime on every event add, so driving the Control.Event code with time deltas is really slow. Control.Timeout is obviously very good with time deltas.

Control.Event Released

Submitted by TomMD on Sat, 02/16/2008 - 11:22am.

I've released control-event (on hackage.haskell.org), which provides a system to schedule and cancel IO events for given System.Time.ClockTimes. While designed prior to the release of Control.Timeout, it is very similar, the main differences are:

  1. Control.Event requires initilization while Control.Timeout hides this via unsafePerformIO.
  2. Control.Event can add events in the STM monad using addEventSTM. While Control.Timeout has an addEventAtomic, they aren't drop in replacements for each other (addEventAtomic is an IO action that returns an STM operation).
  3. Control.Timeout externally interfaces by time deltas (Float) while Control.Event is based on absolute ClockTimes.
  4. Control.Timeout uses only one Haskell thread while Control.Event needs two.
  5. Control.Event will not duplicate EventIds once the counter overflows. The possibility of this happening in Control.Timeout is entirely theoretical - I'm not claiming there is a practical difference.

Future Changes:

  • I'll either expand or eliminate the extra features of Event Preprocessing.
  • If no one uses it, I'll probably be getting rid of the ability to run expired actions in the event system thread.
  • I'm guessing Data.Time works on Windows. So long as I don't learn otherwise, I'll move from the old-time package to the 'time' package.
  • I'll add a shim to allow adding events using time deltas.

EDIT: I should add that Control.Timeout performs MUCH better.

Let me know if you find this useful!

Improving Haskell MD5

Submitted by TomMD on Fri, 11/09/2007 - 7:42pm.

I cleaned up the code!

  • Fixed the ugly 'blocks' function. This results in better performance and fixed the space leak!
  • Changed the license from GPL2 to BSD3.
  • Eliminating the 'finalize' doesn't give any further performance, so I think this was a really odd bug in my code, not GHC.

    Again, the code is at:
    darcs get http://code.haskell.org/~tommd/pureMD5

    And here is the quick and dirty performance of the unrolled version:

    $ time ./hsMD5 ../2GB
    b8f1f1faa6dda5426abffb3a7811c1fb

    real 1m22.685s
    user 0m31.730s
    sys 0m4.676s

    $ time md5sum ../2GB
    b8f1f1faa6dda5426abffb3a7811c1fb ../2GB

    real 1m21.343s
    user 0m25.330s
    sys 0m4.528s

    So, at least on my clunker, the Haskell code shows good time on the 2GB test.


    $ time ./hsMD5 ../200MB
    eda9a9889837ac4bc81d6387d92c1bec

    real 0m2.267s
    user 0m2.148s
    sys 0m0.116s

    $ time md5sum ../200MB
    eda9a9889837ac4bc81d6387d92c1bec ../200MB

    real 0m0.775s
    user 0m0.704s
    sys 0m0.068s

    And its just under 3 time slower than C on that one.

  • to build wxHaskell, or what is the best GUI library on Haskell?

    Submitted by jmuk on Fri, 10/13/2006 - 12:27am.

    I just tried to build wxHaskell-0.9.4 on my GHC-6.6 system, but I failed.
    The error message is follows:
    ---
    wx/src/Graphics/UI/WX/Types.hs:94:0:
    Bad interface file: out/wx/imports/Graphics/UI/WXCore/Types.hi
    Something is amiss; requested module wx:Graphics.UI.WXCore.Types differs from name found in the interface file wxcore:Graphics.UI.WXCore.Types
    ---

    I am confused by this error message at first, but finally I understand. In GHC 6.6, a restriction that you cannot use two packages together if they contain a module with the same name has been removed. As the result, the package name is embedded into interface files (.hi files) too. Therefore, the interface files of wxcore package cannot be referred because they are different packages.

    To solve this problem, you can install wxcore package at first, and then, build wx package with -package wxcore flag. The complete way is the followings.

    1. edit makefile as follows:
    1-a. -package-name $(WX) -> -package-name $(WX)-0.9.4 -package wxcore
    1-b. -package-name $(WXCORE) -> -package-name $(WXCORE)-0.9.4
    1-c. remove `wxcore' from the dependency of target `wx'
    1-d. remove all dependencies of target `wx-install-files'
    2. edit config/wxcore.pkg to eliminate dependencies of lang and concurrency because they are included in base(?)
    3. make wxcore
    4. sudo make wxcore-install-files wxcore-register
    5. make wxcore-clean
    6. make wx
    7. sudo make wx-install-files wx-register

    And then, I compiled some sample codes in wxHaskell, and confirmed they works.
    The best way to fix this problem will be cabalization, but I did not try it.

    By the way, is wxHaskell active? I see that they stops any actions. If it is inactive, what is the suitable GUI library for Haskell??

    I tried to use gtk2hs, but I failed on my GHC 6.6 system. It uses obsolete Data.FiniteMap. I replaced it and corresponding functions as Data.Map, but other compile errors, for example `no such function: emptySet', occurs, and I gave up.

    Are there any other libraries?

    memcached client

    Submitted by jmuk on Thu, 10/12/2006 - 9:07am.

    I wrote two new modules on HaskellNet. One is JSON library and the other is memcached client.

    Memcached is a distributed memory object caching system. see: http://www.danga.com/memcached/

    I take a look at the memcached protocol, and think it simple. So, I'd like to write them in Haskell. I have no confidence whether or not memcached client is suit to HaskellNet, but it will be something usable. I'll think later whether it should come with HaskellNet or not.

    The code can be seen at http://darcs.haskell.org/SoC/haskellnet/HaskellNet/Memcache.hs

    BTW, after writing my code, I found another implementation of memcached client written in Haskell.
    That's the way the world goes.

    getContents with gnutls...

    Submitted by jmuk on Sun, 08/20/2006 - 8:04pm.

    GetContents like actions are hard to implement with gnutls. hsgnutls prepares `tlsRecv', but it blocks when there are nothing to be read. tlsCheckPending is said to check the lenth of `pending' (readable) buffer in gnutls, but it always returns 0.

    I read `gnutls-cli', a telnet like gnutls command because it can read until there are something to be read. Then, I found that gnutls-cli uses select(2) to know whether reading is ready or not. Oops...

    Next, I keep the Handle for tlsClient in TlsSession data structure, and use hReady :: Handle -> IO Bool to check if the tls session is readable. It seems to succeed when I test in ghci, but fails when using in other actions (like bsGetContents). The reason will be that the state of hReady cannot change so rapidly. For example,
    *TLSStream> bsPutCrLf s (BS.pack "a001 CAPABILITY") >> hReady h >>= print
    False
    *TLSStream> hReady h >>= print
    True

    Then, I try to use hWaitForInput :: Handle -> Int -> IO Bool because it can wait some period. Now, it succeed to check with the waiting time of 500 miliseconds.

    Are there any other (elegant) implementation of `GetContent' with gnutls?

    rename the directory `Network' to `HaskellNet'

    Submitted by jmuk on Wed, 07/05/2006 - 11:11pm.

    On HaskellNet, I renamed the directory `Network' to `HaskellNet' because of conflict of the module name. HaskellNet already contains HTTP modules such as Network.HTTP, Network.Stream, and Network.TCP. However, ghc does not allow for two other packages to have a module of same name. You cannot build haskellnet if you have already installed HTTP, and vice versa.

    Progress Report of HaskellNet

    Submitted by jmuk on Sat, 06/17/2006 - 9:10pm.

    At first, the repository of HaskellNet is moved to http://darcs.haskell.org/SoC/haskellnet/. Please refer to it after this.

    I imported HTTP modules into haskellnet. As I mentioned, the importing have a problem. It has no patch data of HTTP because I just copied the files into my directory and did darcs add.

    Next, I modified the implementation of SMTP and POP3 to depend on the Network.Stream and Network.TCP which seem to be very useful abstractions of socket.
    # However, it should become a part of Streams or SSC...? I don't know the best solution by now.

    Because the MissingH is LGPL and HAppS is GPL, I cannot import these libraries into HaskellNet directly. Now I discuss with the developers of these libs, and I will distribute other version of HaskellNet such like `HaskellNet.LGPL' which includes current my implementation and other LGPL network libraries.

    And then, IMAP implementation have little progress. I wrote a sort of parser for the server response.

    BTW, there is a problem to build haskellnet. Because it has same modules of HTTP, ghc is confused at the system which already is installed HTTP. Therefore, I must have removed the HTTP before building haskellnet, which annoys me...
    Can I change the module name to HaskellNet from Network? Or, other better solutions?

    progress report of HaskellNet

    Submitted by jmuk on Sun, 06/04/2006 - 11:40am.

    At first, my plan is as follows.

    First, I'll collect up some libraries which relates with network. For example, HTTP, MissingH (FTP), HAppS (DNS, HTTP, SMTP), and WASH-MAIL (MIME) are the candidates. Second, I'll implement other lacking protocols such as IMAP. POP3 and SMTP is aready implemented by me (see hazakura project).

    And third, I'll fix up the libraries -- API level consistency is important for the programmer. The similar naming and similar API is required.

    Finally, I'll set up a kind of test suit. Because the validation for the network behavior is hard to describe, the test will be a text level (local) check of query string, or parser for the peer response.

    By now, the status is `collect up'. There are some problems to importing other libraries -- but, the most difficult problem is not coding. It is license problem (sigh).

    I also read RFC 3501 and start to implement Network.IMAP.

    New MissingH and MissingPy

    Submitted by jgoerzen on Wed, 04/06/2005 - 7:47am.

    Two new releases today:

    First, MissingH, from the announcement, the major new features are compatability with the latest GHC and Hugs, CSV parser, parser for some Debian files, and the new wholeMap function.

    A new version of MissingPy is also out, with only minor updates to work with the latest Cabal code.