Blogs

packrat parsing, and idea of `ParserT'

Submitted by jmuk on Wed, 08/16/2006 - 1:05am.

I use Packrat Parsing for the parser of HaskellNet.
see, http://darcs.haskell.org/SoC/haskellnet/Text/
and, more detailed information of Packrat Parsing are seen at http://pdos.csail.mit.edu/~baford/packrat/

Packrat Parsing is easy to use and develop.

Of course, there already exists Parsec. However, Parsec cannot be applied with ByteString. Parsec are intended for lists of tokens. With Packrat Parsing, we must define dvChar -- a function to calculate the `next' character of a stream -- by ourselves, so it is easier to use with ByteString. It's great.

Then, I want a kind of `ParserT'. For example, IMAP server response may includes `status updates' informations. Server may respond as follows:

* 22 EXPUNGE
* 23 EXISTS
* 3 RECENT
* 14 FETCH (FLAGS (\Seen \Deleted))
* CAPABILITY IMAP4rev1 STARTTLS AUTH=GSSAPI LOGINDISABLED
abcd OK CAPABILITY completed

This response is primarily for `capability' of the server, but this response also includes the `new' information of current mailbox.

Currently, I split these responses as (ServerResponse, MailboxUpdate, ResponseData). ServerResponse is OK, BAD, NO and so on. ResponseData is [String] in this case. MailboxUpdate is Recent 3 and so on. After parsed, the connection data updates its mailbox information.

If we have MonadTrans of Parser, we can update the mailbox information at the time of parsing `3 Recent'. Then, there are no need to prepare MailboxUpdate type and such confusing structure.
In other cases, `ParserT' seems to be useful.

I thought this problem a little but it seems difficult. Are there any ideas?
I'd like to think later...

What is the best abstraction for network socket?

Submitted by jmuk on Thu, 07/27/2006 - 5:32am.

There are many stream like data structure proposed now. At first, the standard library already contains Network and Network.Socket. Network module expresses socket as a Handle and Network.Socket module does it as `Socket' type, which is not compatible with Handle.

Then, Streams and (http://haskell.org/haskellwiki/Library/Streams) and SSC are proposed (http://yogimo.sakura.ne.jp/ssc/). Which is more proper library?
Streams has no consideration about networking and SSC has. But, Socket of SSC is just an instance of BlockPort (using Ptr a), which is not a good abstraction for sockets, IMHO.
Is SSC better choise?

Then, I'd like to handle sockets with ByteString for performance reason. And I will have to deal with SSL/TLS using hsgnutls or such like libraries.

Now, HaskellNet is written by Network.Stream and Network.TCP originally came from HTTP. They are good abstraction about socket, but are not the best one when considering about ByteString-ization and SSL/TLS.

Please let me know about other implementation or consideration about this topic if you know.

IMAP implemented, but ...

Submitted by jmuk on Wed, 07/26/2006 - 11:38am.

Now, I wrote IMAP4.
see, http://darcs.haskell.org/SoC/haskellnet/HaskellNet/IMAP.hs

But, there are two critical problems.

1. no test

I did not do any tests for the implementation because my IMAP environment requires SSL. The current state is only `compile succeed'.
I'd like to write codes to connect IMAP with hsgnutls, and test as soon as possible.

2. ugly codes

The code is too ugly. And it may have many unnecessary definitions. It also has no comments. I'll brush up my codes later...

IMAP4rev1 is pretty huge protocol. It is difficult to parse the server's response, especially FETCH. Implementing IMAP takes longer and longer time than I thought. I get fed up with IMAP...

When perfect all-encompassing systems get the rug pulled out from under them...

Submitted by metaperl on Sun, 07/16/2006 - 1:31pm.

So I'm here at home, pissed. I really want to crank out a website, but dont want to be limited to generating HTML. Some website generators produce PDF files. And some allow for adding validation. And some allow for tagging of document parts so that you can re-navigate the documented based on some combination of tags. And unfortunately, the perfect document system was written before the web came out: LaTeX.

So LaTeX's idea of device independant did not include the web device. And that excellent system of document references, BibTeX did not include any concept of hypertext media types.

Sigh.

And now we get to Haskell. Is it the perfect language for building websites automatically? If so, then why is this site done in Drupal instead of in Haskell?

Sluggishness

Submitted by itkovian on Thu, 07/13/2006 - 2:24am.

I've been struggling with the following issue for over three weeks. At this point, it seems like there's nothing I can improve to make certain the application runs faster or allocates memory bounded by O(n), n being the size of the file I'm processing.

The goal is to construct a tree structure from the data contained in a file. Each line in said file indicates an entry or exitpoint of a function that was called (the file contains an execution trace of a Java app). When processing an entry point, the relevant data is stored in a Method record (annotated with a pre (count) number) and the lot is pushed onto a stack. When an exit point is encoutered, the top of the stack is checked. If the functions match, the stack is popped, the Method is annotated with a post (count) number, and we store the annotated Method record in a queue. The queue then contains all the information needed to contruct a real Tree, using the pre and post numbers.

And additional complication is the fact that each file line also belongs to an execution thread, which is located in the line. We need to construct a tree for each thread. For this we use a Data.Map that maps the thread in (:: Int) to the relevant stack and queue.

Finally, we also have a mapping fro function ids to function names, also maintained in a Data.Map

I am using a State Monad with the state contained in a ParserState record. Each line with update the state via updateState, where I make a distinction between the possible kinds of line in the trace, by swithcing on the first field of each line. Fields are separated by a space. I'm using the Data.ByteString stuff to lazily read the file.

I've tried to get Haskell to evaluate each update to the State through succintly using the seq operator. Perhaps I overdid that.

The question remains where and how I need to change my code to

  • Make things faster
  • Use less memory.

A profiling run (after -O2) showing retained sets seem to indicate super-linear behaviour in retaining data.

All comments are appreciated.

I'm pasting the Main module contents here.


import Debug.Trace
import System.Environment
import System.IO
import Data.Map as M
import Data.Maybe
import Data.List (sortBy)
import Control.Monad.State

import qualified Data.ByteString.Lazy.Char8 as B


import Method
import Tree
import Algorithms

type Preorder  = Integer
type Postorder = Integer
type MethodMap = M.Map Integer B.ByteString
type ThreadMap = M.Map Integer [(Preorder, Postorder, Method)]


data ParserState = ParserState { methodStack :: !ThreadMap
                               , methodQueue :: !ThreadMap
                               , pre         :: !Integer
                               , post        :: !Integer
                               , methodMap   :: !MethodMap
                               , currentThread :: !Integer
                               } deriving (Show)

initialParserState :: ParserState
initialParserState = ParserState e e 0 0 e 0
  where e = M.empty :: Map Integer a

readInteger :: B.ByteString -> Integer
readInteger = fromIntegral . fst . fromJust . B.readInt


parseTraceMonadic :: [B.ByteString] -> ParserState
parseTraceMonadic ss = state { methodQueue = M.map reverse (methodQueue state) }
  where state = execState (mapM_ (\x -> modify (updateState x) ) ss) initialParserState
  {- I've pushed this through a >> get >>= (`seq` return()) too -} 

updateState :: B.ByteString -> ParserState -> ParserState
updateState s state = case (B.unpack $ head fields) of
  "M" -> updateStateMethod     fields state
  "E" -> updateStateException  fields state
  "C" -> updateStateEntry      fields state
  "R" -> updateStateExit       fields state
  where fields = B.splitWith (== ' ') s
  

updateStateMethod :: [B.ByteString] -> ParserState -> ParserState
updateStateMethod (_:methodId:methodName:_) state = 
  let methodMap' = M.insert (readInteger methodId) methodName (methodMap state)
  in methodMap' `seq` state { methodMap = methodMap' }

updateStateException :: [B.ByteString] -> ParserState -> ParserState
updateStateException _ state = state

updateStateEntry :: [B.ByteString] -> ParserState -> ParserState
updateStateEntry (_:ss) state = 
  let methodStack' = updateMap thread (methodStack state) (\x y -> Just (x:y)) (pre state, 0, method)
  in let pre' = ((+1) $! (pre state))
  in methodAtack' `seq` pre' `seq`
  state { methodStack = methodStack'
        , pre = pre' }
  where method = mkMethod (Prelude.map B.unpack ss)
        thread = Method.thread method

updateStateExit :: [B.ByteString] -> ParserState -> ParserState
updateStateExit (_:ss) state = 
  case updateMethod m (Prelude.map B.unpack ss) of
     Just um -> let methodStack' = updateMap thread 
                                   (methodStack state) 
                                   (\x y -> Just (tail y)) 
                                   (pre_, post state, um)
                in let methodQueue' = updateMap thread 
                                      (methodQueue state) 
                                      (\x y -> Just (x:y)) 
                                      (pre_, post state, um)
                in let post' = ((+1) $! (post state))
                in methodStack' `seq` methodQueue' `seq` post' `seq`
                state { methodStack = methodStack'
                      , methodQueue = methodQueue'
                      , post = post' }
     Nothing -> error $    "Top of the stack is mismatching! Expected " 
                        ++ (show m) ++ " yet got " ++ (show ss) 
                        ++ "\n" ++ (show state)
  where method = mkMethod (Prelude.map B.unpack ss)
        thread = Method.thread method    
        (pre_, _, m) = let x = M.lookup thread (methodStack state) 
                       in x `seq` case x of
                          Just stack -> head stack
                          Nothing    -> error $    "Method stack has not been found for thread " 
                                                ++ (show thread) ++ " -> fields: " ++ (show ss)
        

updateMap key map f value = case M.member key map of
                              True  -> M.update (f value) key map
                              False -> M.insert key [value] map


main = do
          [filename] <- System.Environment.getArgs
          file       <- System.IO.openFile filename System.IO.ReadMode
          contents   <- B.hGetContents file
          let parserState = parseTraceMonadic . B.lines $ contents
          print (methodQueue parserState)
          print (methodStack parserState)
          print (methodMap parserState)
          print (pre parserState)
          print (post parserState)

I think taking a peek at the Method module can;t hurt either, so:


data Method = Method { mid :: Integer
                     , thread :: Integer
                     , instruction_entry :: Integer
                     , instruction_exit :: Integer
                     } deriving (Eq, Show)

eM = Method 0 0 0 0

mkMethod :: [String] -> Method
mkMethod s = let [_thread, _id, _entry] = take 3 $ map (read :: String -> Integer) s 
             in [_thread, _id, _entry] `seq` Method { mid = _id
                                                    , thread = _thread
                                                    , instruction_entry = _entry
                                                    , instruction_exit = 0
                                                    }

updateMethod :: Method -> [String] -> Maybe Method
updateMethod (Method mid thread instruction_entry instruction_exit ) s
  | thread == _thread && mid == _id = _exit `seq` Just Method { mid = mid
                                                              , thread = thread
                                                              , instruction_entry = instruction_entry
                                                              , instruction_exit = _exit
                                                              }
  | otherwise = Nothing
  where [_thread, _id, _exit] = take 3 $ map (read :: String -> Integer) s

"no more complex statement in any language"

Submitted by metaperl on Fri, 07/07/2006 - 7:14am.

My instructor here again made a stimulating statement: "the most complex statement in any language is the SELECT statement: you can group, multiply, format and more"

Opinions?

rename the directory `Network' to `HaskellNet'

Submitted by jmuk on Wed, 07/05/2006 - 11:11pm.

On HaskellNet, I renamed the directory `Network' to `HaskellNet' because of conflict of the module name. HaskellNet already contains HTTP modules such as Network.HTTP, Network.Stream, and Network.TCP. However, ghc does not allow for two other packages to have a module of same name. You cannot build haskellnet if you have already installed HTTP, and vice versa.

"You don't learn programming by theory, you learn it by hacking"

Submitted by metaperl on Wed, 07/05/2006 - 2:13pm.

I'm in a training course for MS SQL Server 2005, and the teacher said: "how did I learn programming? I went to a magazine and typed in some code and watched it run. You don't learn programming by theory, you learn it by hacking."

I bit my tongue because I'm sure many Haskellers (mathematician like Cale and physicists like David Roundy) did learn Haskell by placing theoretical expectations on how the language should work from a theoretical perspective and then seeing how haskell implemented their theoretical expectations.

What is meant by "cores" in microprocessor terminology?

Submitted by metaperl on Tue, 07/04/2006 - 3:01pm.

This post is somewhat off-topic, but I think it relates to Haskell enough to post here, so please indulge me.

I was reading an article on Erlang. In the section entitled Scalable Is the New Fast, it was said:

These days, you’re much less likely to get a chip that’s twice as fast, but you may get one with twice as many cores.

So my question is: what is a core and how does it affect hardware performance?

And to bring this post into Haskell context, is there some way for Haskell programs to execute faster as the machine it runs on gains cores?

Progress Report of HaskellNet

Submitted by jmuk on Sat, 06/17/2006 - 9:10pm.

At first, the repository of HaskellNet is moved to http://darcs.haskell.org/SoC/haskellnet/. Please refer to it after this.

I imported HTTP modules into haskellnet. As I mentioned, the importing have a problem. It has no patch data of HTTP because I just copied the files into my directory and did darcs add.

Next, I modified the implementation of SMTP and POP3 to depend on the Network.Stream and Network.TCP which seem to be very useful abstractions of socket.
# However, it should become a part of Streams or SSC...? I don't know the best solution by now.

Because the MissingH is LGPL and HAppS is GPL, I cannot import these libraries into HaskellNet directly. Now I discuss with the developers of these libs, and I will distribute other version of HaskellNet such like `HaskellNet.LGPL' which includes current my implementation and other LGPL network libraries.

And then, IMAP implementation have little progress. I wrote a sort of parser for the server response.

BTW, there is a problem to build haskellnet. Because it has same modules of HTTP, ghc is confused at the system which already is installed HTTP. Therefore, I must have removed the HTTP before building haskellnet, which annoys me...
Can I change the module name to HaskellNet from Network? Or, other better solutions?