I tried to write a parser for bencode format. Bencode is normally used in BitTorrent only, but can be used for generally purpose.
In order to write the parser with parser combinators libraries such as Parsec, I encounter a problem. I applied Packrat Parsing to JSON parser, but I may use a sledge hummer to kill a fly. JSON is a simple format, it is very easy to parse with normal string-related functions. Packrat Parsing is too strong mechanism.
However, my point is not power of parser library, but memory copying. Normally, the parse result is copied from the original text by parser library. On the other hand, ByteString head, tail, init, last, take, drop, and similar functions do not copy any text data, but just modifies the offset and length of the text. If the parsed data has many text data, the parser must do the memory copy that is not potentially required.
The parser combinator library that does this is a challengeable task -- but, here I applies a small approach to parse bencode format. See http://darcs.haskell.org/SoC/haskellnet/Text/Bencode.hs
Here, all parsers are Writer monad.
Writer monad does `Computations which produce a stream of data in addition to the computed values' (came from All about monads). In parser context, `a stream' corresponds to the parsed result, and the computed value to rest of text.
We can easily combines the parser for `list of nodes' or `dictionary of nodes', using a function untilM :: (a -> Bool) -> (a -> m a) -> a -> m a.
It is true that this simplest approach lacks some important features. For example, this parser do not contains line and character information, which means that the parse error cannot print appropriate debug info. In despite of such fault, this parser works well for our purpose. Debug information will be the next step.
BTW, debug info is not a serious problem for bencode parser, because this format is not human-readable. In such case, the required information is whether the format is valid or not.
I just tried to build wxHaskell-0.9.4 on my GHC-6.6 system, but I failed.
The error message is follows:
Bad interface file: out/wx/imports/Graphics/UI/WXCore/Types.hi
Something is amiss; requested module wx:Graphics.UI.WXCore.Types differs from name found in the interface file wxcore:Graphics.UI.WXCore.Types
I am confused by this error message at first, but finally I understand. In GHC 6.6, a restriction that you cannot use two packages together if they contain a module with the same name has been removed. As the result, the package name is embedded into interface files (.hi files) too. Therefore, the interface files of wxcore package cannot be referred because they are different packages.
To solve this problem, you can install wxcore package at first, and then, build wx package with -package wxcore flag. The complete way is the followings.
1. edit makefile as follows:
1-a. -package-name $(WX) -> -package-name $(WX)-0.9.4 -package wxcore
1-b. -package-name $(WXCORE) -> -package-name $(WXCORE)-0.9.4
1-c. remove `wxcore' from the dependency of target `wx'
1-d. remove all dependencies of target `wx-install-files'
2. edit config/wxcore.pkg to eliminate dependencies of lang and concurrency because they are included in base(?)
3. make wxcore
4. sudo make wxcore-install-files wxcore-register
5. make wxcore-clean
6. make wx
7. sudo make wx-install-files wx-register
And then, I compiled some sample codes in wxHaskell, and confirmed they works.
The best way to fix this problem will be cabalization, but I did not try it.
By the way, is wxHaskell active? I see that they stops any actions. If it is inactive, what is the suitable GUI library for Haskell??
I tried to use gtk2hs, but I failed on my GHC 6.6 system. It uses obsolete Data.FiniteMap. I replaced it and corresponding functions as Data.Map, but other compile errors, for example `no such function: emptySet', occurs, and I gave up.
Are there any other libraries?
I wrote two new modules on HaskellNet. One is JSON library and the other is memcached client.
Memcached is a distributed memory object caching system. see: http://www.danga.com/memcached/
I take a look at the memcached protocol, and think it simple. So, I'd like to write them in Haskell. I have no confidence whether or not memcached client is suit to HaskellNet, but it will be something usable. I'll think later whether it should come with HaskellNet or not.
The code can be seen at http://darcs.haskell.org/SoC/haskellnet/HaskellNet/Memcache.hs
BTW, after writing my code, I found another implementation of memcached client written in Haskell.
That's the way the world goes.
GetContents like actions are hard to implement with gnutls. hsgnutls prepares `tlsRecv', but it blocks when there are nothing to be read. tlsCheckPending is said to check the lenth of `pending' (readable) buffer in gnutls, but it always returns 0.
I read `gnutls-cli', a telnet like gnutls command because it can read until there are something to be read. Then, I found that gnutls-cli uses select(2) to know whether reading is ready or not. Oops...
Next, I keep the Handle for tlsClient in TlsSession data structure, and use hReady :: Handle -> IO Bool to check if the tls session is readable. It seems to succeed when I test in ghci, but fails when using in other actions (like bsGetContents). The reason will be that the state of hReady cannot change so rapidly. For example,
*TLSStream> bsPutCrLf s (BS.pack "a001 CAPABILITY") >> hReady h >>= print
*TLSStream> hReady h >>= print
Then, I try to use hWaitForInput :: Handle -> Int -> IO Bool because it can wait some period. Now, it succeed to check with the waiting time of 500 miliseconds.
Are there any other (elegant) implementation of `GetContent' with gnutls?
I use Packrat Parsing for the parser of HaskellNet.
and, more detailed information of Packrat Parsing are seen at http://pdos.csail.mit.edu/~baford/packrat/
Packrat Parsing is easy to use and develop.
Of course, there already exists Parsec. However, Parsec cannot be applied with ByteString. Parsec are intended for lists of tokens. With Packrat Parsing, we must define dvChar -- a function to calculate the `next' character of a stream -- by ourselves, so it is easier to use with ByteString. It's great.
Then, I want a kind of `ParserT'. For example, IMAP server response may includes `status updates' informations. Server may respond as follows:
* 22 EXPUNGE
* 23 EXISTS
* 3 RECENT
* 14 FETCH (FLAGS (\Seen \Deleted))
* CAPABILITY IMAP4rev1 STARTTLS AUTH=GSSAPI LOGINDISABLED
abcd OK CAPABILITY completed
This response is primarily for `capability' of the server, but this response also includes the `new' information of current mailbox.
Currently, I split these responses as (ServerResponse, MailboxUpdate, ResponseData). ServerResponse is OK, BAD, NO and so on. ResponseData is [String] in this case. MailboxUpdate is Recent 3 and so on. After parsed, the connection data updates its mailbox information.
If we have MonadTrans of Parser, we can update the mailbox information at the time of parsing `3 Recent'. Then, there are no need to prepare MailboxUpdate type and such confusing structure.
In other cases, `ParserT' seems to be useful.
I thought this problem a little but it seems difficult. Are there any ideas?
I'd like to think later...
There are many stream like data structure proposed now. At first, the standard library already contains Network and Network.Socket. Network module expresses socket as a Handle and Network.Socket module does it as `Socket' type, which is not compatible with Handle.
Then, Streams and (http://haskell.org/haskellwiki/Library/Streams) and SSC are proposed (http://yogimo.sakura.ne.jp/ssc/). Which is more proper library?
Streams has no consideration about networking and SSC has. But, Socket of SSC is just an instance of BlockPort (using Ptr a), which is not a good abstraction for sockets, IMHO.
Is SSC better choise?
Then, I'd like to handle sockets with ByteString for performance reason. And I will have to deal with SSL/TLS using hsgnutls or such like libraries.
Now, HaskellNet is written by Network.Stream and Network.TCP originally came from HTTP. They are good abstraction about socket, but are not the best one when considering about ByteString-ization and SSL/TLS.
Please let me know about other implementation or consideration about this topic if you know.
Now, I wrote IMAP4.
But, there are two critical problems.
1. no test
I did not do any tests for the implementation because my IMAP environment requires SSL. The current state is only `compile succeed'.
I'd like to write codes to connect IMAP with hsgnutls, and test as soon as possible.
2. ugly codes
The code is too ugly. And it may have many unnecessary definitions. It also has no comments. I'll brush up my codes later...
IMAP4rev1 is pretty huge protocol. It is difficult to parse the server's response, especially FETCH. Implementing IMAP takes longer and longer time than I thought. I get fed up with IMAP...
On HaskellNet, I renamed the directory `Network' to `HaskellNet' because of conflict of the module name. HaskellNet already contains HTTP modules such as Network.HTTP, Network.Stream, and Network.TCP. However, ghc does not allow for two other packages to have a module of same name. You cannot build haskellnet if you have already installed HTTP, and vice versa.
At first, the repository of HaskellNet is moved to http://darcs.haskell.org/SoC/haskellnet/. Please refer to it after this.
I imported HTTP modules into haskellnet. As I mentioned, the importing have a problem. It has no patch data of HTTP because I just copied the files into my directory and did darcs add.
Next, I modified the implementation of SMTP and POP3 to depend on the Network.Stream and Network.TCP which seem to be very useful abstractions of socket.
# However, it should become a part of Streams or SSC...? I don't know the best solution by now.
Because the MissingH is LGPL and HAppS is GPL, I cannot import these libraries into HaskellNet directly. Now I discuss with the developers of these libs, and I will distribute other version of HaskellNet such like `HaskellNet.LGPL' which includes current my implementation and other LGPL network libraries.
And then, IMAP implementation have little progress. I wrote a sort of parser for the server response.
BTW, there is a problem to build haskellnet. Because it has same modules of HTTP, ghc is confused at the system which already is installed HTTP. Therefore, I must have removed the HTTP before building haskellnet, which annoys me...
Can I change the module name to HaskellNet from Network? Or, other better solutions?
... in a limited situation.
I will import the HTTP library into my haskellnet repository. However, the importing process of darcs freezes.
Probably, this occurs because both repositories have same directory name, Network. HTTP contains a patch to create Network directory (adddir Network), but this directory already exists in haskellnet library. So darcs are confused.
Certainly, I can easily import the HTTP files directly (not through darcs pull). But, in such case, there will be two problems.
- I lost all of the history of the works for HTTP
- If some patches are send to either library, it is difficult and annoying to send the paches to other library.
I have an idea to deal with this problem -- modifying the patch manually. But darcs allows such modification?