itkovian's blog


Submitted by itkovian on Thu, 07/13/2006 - 2:24am.

I've been struggling with the following issue for over three weeks. At this point, it seems like there's nothing I can improve to make certain the application runs faster or allocates memory bounded by O(n), n being the size of the file I'm processing.

The goal is to construct a tree structure from the data contained in a file. Each line in said file indicates an entry or exitpoint of a function that was called (the file contains an execution trace of a Java app). When processing an entry point, the relevant data is stored in a Method record (annotated with a pre (count) number) and the lot is pushed onto a stack. When an exit point is encoutered, the top of the stack is checked. If the functions match, the stack is popped, the Method is annotated with a post (count) number, and we store the annotated Method record in a queue. The queue then contains all the information needed to contruct a real Tree, using the pre and post numbers.

And additional complication is the fact that each file line also belongs to an execution thread, which is located in the line. We need to construct a tree for each thread. For this we use a Data.Map that maps the thread in (:: Int) to the relevant stack and queue.

Finally, we also have a mapping fro function ids to function names, also maintained in a Data.Map

I am using a State Monad with the state contained in a ParserState record. Each line with update the state via updateState, where I make a distinction between the possible kinds of line in the trace, by swithcing on the first field of each line. Fields are separated by a space. I'm using the Data.ByteString stuff to lazily read the file.

I've tried to get Haskell to evaluate each update to the State through succintly using the seq operator. Perhaps I overdid that.

The question remains where and how I need to change my code to

  • Make things faster
  • Use less memory.

A profiling run (after -O2) showing retained sets seem to indicate super-linear behaviour in retaining data.

All comments are appreciated.

I'm pasting the Main module contents here.

import Debug.Trace
import System.Environment
import System.IO
import Data.Map as M
import Data.Maybe
import Data.List (sortBy)
import Control.Monad.State

import qualified Data.ByteString.Lazy.Char8 as B

import Method
import Tree
import Algorithms

type Preorder  = Integer
type Postorder = Integer
type MethodMap = M.Map Integer B.ByteString
type ThreadMap = M.Map Integer [(Preorder, Postorder, Method)]

data ParserState = ParserState { methodStack :: !ThreadMap
                               , methodQueue :: !ThreadMap
                               , pre         :: !Integer
                               , post        :: !Integer
                               , methodMap   :: !MethodMap
                               , currentThread :: !Integer
                               } deriving (Show)

initialParserState :: ParserState
initialParserState = ParserState e e 0 0 e 0
  where e = M.empty :: Map Integer a

readInteger :: B.ByteString -> Integer
readInteger = fromIntegral . fst . fromJust . B.readInt

parseTraceMonadic :: [B.ByteString] -> ParserState
parseTraceMonadic ss = state { methodQueue = reverse (methodQueue state) }
  where state = execState (mapM_ (\x -> modify (updateState x) ) ss) initialParserState
  {- I've pushed this through a >> get >>= (`seq` return()) too -} 

updateState :: B.ByteString -> ParserState -> ParserState
updateState s state = case (B.unpack $ head fields) of
  "M" -> updateStateMethod     fields state
  "E" -> updateStateException  fields state
  "C" -> updateStateEntry      fields state
  "R" -> updateStateExit       fields state
  where fields = B.splitWith (== ' ') s

updateStateMethod :: [B.ByteString] -> ParserState -> ParserState
updateStateMethod (_:methodId:methodName:_) state = 
  let methodMap' = M.insert (readInteger methodId) methodName (methodMap state)
  in methodMap' `seq` state { methodMap = methodMap' }

updateStateException :: [B.ByteString] -> ParserState -> ParserState
updateStateException _ state = state

updateStateEntry :: [B.ByteString] -> ParserState -> ParserState
updateStateEntry (_:ss) state = 
  let methodStack' = updateMap thread (methodStack state) (\x y -> Just (x:y)) (pre state, 0, method)
  in let pre' = ((+1) $! (pre state))
  in methodAtack' `seq` pre' `seq`
  state { methodStack = methodStack'
        , pre = pre' }
  where method = mkMethod ( B.unpack ss)
        thread = Method.thread method

updateStateExit :: [B.ByteString] -> ParserState -> ParserState
updateStateExit (_:ss) state = 
  case updateMethod m ( B.unpack ss) of
     Just um -> let methodStack' = updateMap thread 
                                   (methodStack state) 
                                   (\x y -> Just (tail y)) 
                                   (pre_, post state, um)
                in let methodQueue' = updateMap thread 
                                      (methodQueue state) 
                                      (\x y -> Just (x:y)) 
                                      (pre_, post state, um)
                in let post' = ((+1) $! (post state))
                in methodStack' `seq` methodQueue' `seq` post' `seq`
                state { methodStack = methodStack'
                      , methodQueue = methodQueue'
                      , post = post' }
     Nothing -> error $    "Top of the stack is mismatching! Expected " 
                        ++ (show m) ++ " yet got " ++ (show ss) 
                        ++ "\n" ++ (show state)
  where method = mkMethod ( B.unpack ss)
        thread = Method.thread method    
        (pre_, _, m) = let x = M.lookup thread (methodStack state) 
                       in x `seq` case x of
                          Just stack -> head stack
                          Nothing    -> error $    "Method stack has not been found for thread " 
                                                ++ (show thread) ++ " -> fields: " ++ (show ss)

updateMap key map f value = case M.member key map of
                              True  -> M.update (f value) key map
                              False -> M.insert key [value] map

main = do
          [filename] <- System.Environment.getArgs
          file       <- System.IO.openFile filename System.IO.ReadMode
          contents   <- B.hGetContents file
          let parserState = parseTraceMonadic . B.lines $ contents
          print (methodQueue parserState)
          print (methodStack parserState)
          print (methodMap parserState)
          print (pre parserState)
          print (post parserState)

I think taking a peek at the Method module can;t hurt either, so:

data Method = Method { mid :: Integer
                     , thread :: Integer
                     , instruction_entry :: Integer
                     , instruction_exit :: Integer
                     } deriving (Eq, Show)

eM = Method 0 0 0 0

mkMethod :: [String] -> Method
mkMethod s = let [_thread, _id, _entry] = take 3 $ map (read :: String -> Integer) s 
             in [_thread, _id, _entry] `seq` Method { mid = _id
                                                    , thread = _thread
                                                    , instruction_entry = _entry
                                                    , instruction_exit = 0

updateMethod :: Method -> [String] -> Maybe Method
updateMethod (Method mid thread instruction_entry instruction_exit ) s
  | thread == _thread && mid == _id = _exit `seq` Just Method { mid = mid
                                                              , thread = thread
                                                              , instruction_entry = instruction_entry
                                                              , instruction_exit = _exit
  | otherwise = Nothing
  where [_thread, _id, _exit] = take 3 $ map (read :: String -> Integer) s

Apple - Intel

Submitted by itkovian on Tue, 06/07/2005 - 2:55am.

This hardly has anything to do with functional programming in general, or Haskell in particular, but I'm a bit upset by the move to Intel processors. It seems to me that going with AMD might've been a more viable choice, at least from a ISA pov. Furthermore, while AMD chips still dissipate more power than Intel processors, the CPU's seem to be much nicer, better designed, and yield real 64-bit power.

Guess I won't be able to upgrade my powerbook status to a shiny G5 in a few years.

Performance of Haskell (ghc-compiled) applications

Submitted by itkovian on Fri, 06/03/2005 - 2:14am.

I am wondering if there exists any interest in doing some research into the low-level performance of ghc-cmpiled Haskell (duh) applications. It seems to me that getting an idea on cache-miss rates, branch prediction, etc. could be of interest to the Haskell community.

Possible routes are
- performance counters (x86, PowerPC, x86-64)
- DSS simulation
- DIABLO/FIT to instrument the binaries (can we obtain statically compiled Haskell apps?)
- DIOTA to gather data

The main problem atm lies in the benchmarks that would be needed (of course!). People at #haskell have suggested using the ghc regression suite, but I am not sure if these qualify as real-world apps, i.e. the real stuff people use Haskell for.

All input is appreciated.