Simple Parsec Example: HTMangL
Writing bird-style Literate Haskell is working out pretty well for me. Actually, I prefer latex-style, but bird-style works well for short articles with little to no math. Bird-style is also fairly easy to integrate with HTML markup; except for the fact that it uses '>' to designate code. Browsers vary, but most seem to handle the non-XHTML compliant markup with few snags. However, I got sick of dealing with those snags as they can be difficult to spot on occasion.
I put together a small program to deal with this mess, and also as a small Parsec usage tutorial. This program processes bird-style Literate Haskell input and outputs a converted version with the code-blocks surrounded by <code> tags while also converting >, <, and & to entity designators.
> module Main where > import Text.ParserCombinators.Parsec
A few combinators are built up from smaller ones.
the remaining characters in a line up to and including
the newline character (or EOF).
> eol = newline <|> (eof >> return '\n') > tilEOL = manyTill (noneOf "\n") eol
A line of code begins with "> " and continues til EOL.
> codeLine = do > string "> " > code <- tilEOL > return $ "> " ++ code
A non-blank literate line can begin with any character
but newline, and if it begins with '>' then it cannot
be followed by a space. To those coming from imperative
return () does not
return from the function but rather returns
to the monad; here it is used as a no-op. The rest of
the line is treated as above.
> litLine = do > ch <- noneOf "\n" > if ch == '>' then > notFollowedBy space > else > return () > text <- tilEOL > return $ ch:text
A blank line is one which begins with a newline.
> blankLine = char '\n' >> return ""
Blocks of code and literate lines (or blanks) are simply multiple consecutive lines (at least 1).
> code = many1 (try codeLine) > lit = many1 (try litLine <|> blankLine)
> data LiterateCode = Literate [String] > | Code [String] > deriving (Show, Eq)
A literate Haskell file is composed of many Code and Literate
blocks. These are unified in one disjoint type,
and the combinator below ensures that the appropriate tag is
applied to the results of parsing.
> literateCode = many (Code `fmap` code <|> Literate `fmap` lit)
A block of literate text is printed literally, but code must be processed slightly.
> printBlock (Literate ls) = mapM_ putStrLn ls > printBlock (Code cs) = do > putStrLn "<code>\n" > mapM_ (putStrLn . subEntities) cs > putStrLn "\n</code><br/>"
In case you were wondering how this works: it maps the function over each character in the input string and concatenates the resulting list of strings.
> subEntities = (>>= \c -> > case c of > '>' -> ">" > '<' -> "<" > '&' -> "&" > c -> [c])
Really simple: work on stdin, print to stdout.
> main = do > s <- getContents > case parse literateCode "stdin" s of > Left err -> putStr "Error: " >> print err > Right cs -> mapM_ printBlock cs
Naturally, the first candidate code to run this program on is this program itself: I call it HTMangL.