Submitted by itkovian on Thu, 07/13/2006 - 2:24am.

I've been struggling with the following issue for over three weeks. At this point, it seems like there's nothing I can improve to make certain the application runs faster or allocates memory bounded by O(n), n being the size of the file I'm processing.

The goal is to construct a tree structure from the data contained in a file. Each line in said file indicates an entry or exitpoint of a function that was called (the file contains an execution trace of a Java app). When processing an entry point, the relevant data is stored in a Method record (annotated with a pre (count) number) and the lot is pushed onto a stack. When an exit point is encoutered, the top of the stack is checked. If the functions match, the stack is popped, the Method is annotated with a post (count) number, and we store the annotated Method record in a queue. The queue then contains all the information needed to contruct a real Tree, using the pre and post numbers.

And additional complication is the fact that each file line also belongs to an execution thread, which is located in the line. We need to construct a tree for each thread. For this we use a Data.Map that maps the thread in (:: Int) to the relevant stack and queue.

Finally, we also have a mapping fro function ids to function names, also maintained in a Data.Map

I am using a State Monad with the state contained in a ParserState record. Each line with update the state via updateState, where I make a distinction between the possible kinds of line in the trace, by swithcing on the first field of each line. Fields are separated by a space. I'm using the Data.ByteString stuff to lazily read the file.

I've tried to get Haskell to evaluate each update to the State through succintly using the seq operator. Perhaps I overdid that.

The question remains where and how I need to change my code to

  • Make things faster
  • Use less memory.

A profiling run (after -O2) showing retained sets seem to indicate super-linear behaviour in retaining data.

All comments are appreciated.

I'm pasting the Main module contents here.

import Debug.Trace
import System.Environment
import System.IO
import Data.Map as M
import Data.Maybe
import Data.List (sortBy)
import Control.Monad.State

import qualified Data.ByteString.Lazy.Char8 as B

import Method
import Tree
import Algorithms

type Preorder  = Integer
type Postorder = Integer
type MethodMap = M.Map Integer B.ByteString
type ThreadMap = M.Map Integer [(Preorder, Postorder, Method)]

data ParserState = ParserState { methodStack :: !ThreadMap
                               , methodQueue :: !ThreadMap
                               , pre         :: !Integer
                               , post        :: !Integer
                               , methodMap   :: !MethodMap
                               , currentThread :: !Integer
                               } deriving (Show)

initialParserState :: ParserState
initialParserState = ParserState e e 0 0 e 0
  where e = M.empty :: Map Integer a

readInteger :: B.ByteString -> Integer
readInteger = fromIntegral . fst . fromJust . B.readInt

parseTraceMonadic :: [B.ByteString] -> ParserState
parseTraceMonadic ss = state { methodQueue = reverse (methodQueue state) }
  where state = execState (mapM_ (\x -> modify (updateState x) ) ss) initialParserState
  {- I've pushed this through a >> get >>= (`seq` return()) too -} 

updateState :: B.ByteString -> ParserState -> ParserState
updateState s state = case (B.unpack $ head fields) of
  "M" -> updateStateMethod     fields state
  "E" -> updateStateException  fields state
  "C" -> updateStateEntry      fields state
  "R" -> updateStateExit       fields state
  where fields = B.splitWith (== ' ') s

updateStateMethod :: [B.ByteString] -> ParserState -> ParserState
updateStateMethod (_:methodId:methodName:_) state = 
  let methodMap' = M.insert (readInteger methodId) methodName (methodMap state)
  in methodMap' `seq` state { methodMap = methodMap' }

updateStateException :: [B.ByteString] -> ParserState -> ParserState
updateStateException _ state = state

updateStateEntry :: [B.ByteString] -> ParserState -> ParserState
updateStateEntry (_:ss) state = 
  let methodStack' = updateMap thread (methodStack state) (\x y -> Just (x:y)) (pre state, 0, method)
  in let pre' = ((+1) $! (pre state))
  in methodAtack' `seq` pre' `seq`
  state { methodStack = methodStack'
        , pre = pre' }
  where method = mkMethod ( B.unpack ss)
        thread = Method.thread method

updateStateExit :: [B.ByteString] -> ParserState -> ParserState
updateStateExit (_:ss) state = 
  case updateMethod m ( B.unpack ss) of
     Just um -> let methodStack' = updateMap thread 
                                   (methodStack state) 
                                   (\x y -> Just (tail y)) 
                                   (pre_, post state, um)
                in let methodQueue' = updateMap thread 
                                      (methodQueue state) 
                                      (\x y -> Just (x:y)) 
                                      (pre_, post state, um)
                in let post' = ((+1) $! (post state))
                in methodStack' `seq` methodQueue' `seq` post' `seq`
                state { methodStack = methodStack'
                      , methodQueue = methodQueue'
                      , post = post' }
     Nothing -> error $    "Top of the stack is mismatching! Expected " 
                        ++ (show m) ++ " yet got " ++ (show ss) 
                        ++ "\n" ++ (show state)
  where method = mkMethod ( B.unpack ss)
        thread = Method.thread method    
        (pre_, _, m) = let x = M.lookup thread (methodStack state) 
                       in x `seq` case x of
                          Just stack -> head stack
                          Nothing    -> error $    "Method stack has not been found for thread " 
                                                ++ (show thread) ++ " -> fields: " ++ (show ss)

updateMap key map f value = case M.member key map of
                              True  -> M.update (f value) key map
                              False -> M.insert key [value] map

main = do
          [filename] <- System.Environment.getArgs
          file       <- System.IO.openFile filename System.IO.ReadMode
          contents   <- B.hGetContents file
          let parserState = parseTraceMonadic . B.lines $ contents
          print (methodQueue parserState)
          print (methodStack parserState)
          print (methodMap parserState)
          print (pre parserState)
          print (post parserState)

I think taking a peek at the Method module can;t hurt either, so:

data Method = Method { mid :: Integer
                     , thread :: Integer
                     , instruction_entry :: Integer
                     , instruction_exit :: Integer
                     } deriving (Eq, Show)

eM = Method 0 0 0 0

mkMethod :: [String] -> Method
mkMethod s = let [_thread, _id, _entry] = take 3 $ map (read :: String -> Integer) s 
             in [_thread, _id, _entry] `seq` Method { mid = _id
                                                    , thread = _thread
                                                    , instruction_entry = _entry
                                                    , instruction_exit = 0

updateMethod :: Method -> [String] -> Maybe Method
updateMethod (Method mid thread instruction_entry instruction_exit ) s
  | thread == _thread && mid == _id = _exit `seq` Just Method { mid = mid
                                                              , thread = thread
                                                              , instruction_entry = instruction_entry
                                                              , instruction_exit = _exit
  | otherwise = Nothing
  where [_thread, _id, _exit] = take 3 $ map (read :: String -> Integer) s

Fun with sections

Submitted by boggle on Mon, 03/21/2005 - 6:14pm.

Still being quite a haskell newbie, I just got a lesson about sections and argument order. I wrote something like the following code in a small program for calculation the maximume queueing time of packets on a CAN bus system. Try out for a mild smile:

some_list = [0..5]

pifilter p i = filter (p (some_list !! i))

main = do
{ putStrLn $ show $ pifilter (<) i some_list;
putStrLn $ show $ filter (< si) some_list
} where i = 2; si = some_list !! i

While at first glance the two lines look like they should do the same, they include two inverse filter predicates (si <) and (< si).

BTW why does indentation get messed up inside <code></code> blocks?

Evolution of a Haskell Programmer

Submitted by shapr on Sat, 02/26/2005 - 8:58am.

K. Fritz Ruehr's Evolution of a Haskell Programmer is a humorous look at the mental development of the Haskell programmer.

LicensedPreludeExts - Prelude Extensions from the community.

Submitted by shapr on Thu, 02/24/2005 - 7:28am.

The LicensedPreludeExts on the Haskell Wiki started with Koen Claessan's permutations function being requested too many times on the #haskell irc channel. I wondered what other bits of code people were missing from the Prelude and asked for contributions.

All About Monads, A comprehensive guide to the theory and practice of monadic programming in Haskell.

Submitted by shapr on Wed, 02/23/2005 - 12:44pm.

Jeff Newbern's All About Monads is the best monad tutorial I've seen yet!

This tutorial starts with the most basic definition of a monad, and why you might want one. It covers most of the monad instances in the standard libraries, and also includes monad transformers. It wraps up nicely with links to Parsec, category theory, and arrows. You can read it online, or download as a zip file or tarball.

If you've been looking for a good monads tutorial, try this one first!

"Algorithms: A Functional Programming Approach" by Fethi Rabhi and Guy Lapalme

Submitted by shapr on Tue, 02/22/2005 - 11:13am.

Algorithms: A Functional Programming Approach is one of my top ten favorite computer science books. First, it covers the basics of Haskell and complexity theory. Then for each algorithm it gives first an easy to read implementation, and then a more efficient but harder to read implementation. Each of the transformations from clear to fast versions are discussed, and optimizations are explained. This book was also my first introduction to methodical step-by-step algorithmic optimization systems, in this case the Burstall & Darlington system. I've since used the lessons I learned in this book in my commercial work in Python, SQL, Java, and of course Haskell.

The best audience for this book is those who are looking for a second Haskell book, or new to algorithms, or would like to learn how to optimize pure (non-monadic) Haskell code systematically. The sections on top-down design techniques and dynamic programming would be of interest to programmers who are still learning and wish to know more about structuring larger programs.

Even with all that content, this softcover book is only 256 pages (coincidentally binary?), allowing for easy reading in any spare moment.

How Many Ways Can We Implement cat(1)?

Submitted by jgoerzen on Mon, 02/21/2005 - 7:45am.

There's a nice little haskell-cafe thread about ways we could implement the Unix utility cat. There are several implementations posted (see also the second page of discussion). If you're trying to learn about I/O in Haskell, this could be a useful place to start.