A while ago, I wrote about adding support for bold to Haddock. Last time, I added tests and made the initial implementation. Now I’m going to briefly go over which files one needs to change for the back-ends.
There are three back-ends: LaTeX, Hoogle and XHtml. XHtml is the one people see the most, I imagine. There are no tests for either Hoogle nor LaTeX and I suspect they might have seen a fair amount of breakage already. I haven’t heard direct complaints though so I’ll look into verifying these later.
First in Utils.hs we add a new default field called markupBold to the idMarkup identity record and immediately after we add a new pattern covering DocBold to the markup function. This is used throughout the back-ends as a uniform interface. We also need to add a pattern for DocBold in the renameDoc function Rename.hs. From what I gather from the very scarce comments, this module renames things such as identifiers into a more human-friendly form: there’s no need to render something as Foo.Bar.Baz if we’re in Foo.Bar. Unfortunately, it’s 500 LOC of juggling of GHC’s types so I can’t be certain. Also add the new pattern to the rename function in LexParseRn.hs which runs the actual look-up.
Updating the interfaces is quite simple in this case. In each ofsrc/Haddock/Backends/Hoogle.hs src/Haddock/Backends/LaTeX.hs src/Haddock/Backends/Xhtml/DocMarkup.hs
find the function where other markup is translated and add your change. For example for LateX, it was a 3 line change:markupEmphasis = \p v -> emph (p v), + markupBold = \p v -> bold (p v), markupMonospaced = \p _ -> tt (p Mono), -- snip… +bold :: LaTeX -> LaTeX +bold ltx = text "\\textbf" <> braces ltx
It’s easy to look around and see how things are done and mimic the behaviour. Last but not least, there’s a InterfaceFile.hs which unsurprisingly deals with Haddock’s interface file. This file can be used later by Haddock to link against packages it has already generated documentation for. Look at the Binary instance for Doc id and change it accordingly. If you change the instance, make sure to bump the binaryInterfaceVersion. Also update the binaryInterfaceVersionCompatibility. This will ensure that we get a nice error message if we try to link between incompatible versions rather than weird behaviour. This file is only relevant if you add/remove markup or structurally change existing one. Simple parser changes to existing markup do not affect this file.
I am not bumping this until everything is finalised and ready for release but I have to be careful to not let any test docs I generate with it get out of the sandbox.
We’re done, both test-suites pass. Here’s a list of changes I had to make all together. The majority is just the tests, with very few actual code changes (and some of it is just clobbering whitespace &c).12 files changed, 170 insertions(+), 12 deletions(-) html-test/ref/Bold.html | 102 html-test/src/Bold.hs | 9 src/Haddock/Backends/Hoogle.hs | 7 src/Haddock/Backends/LaTeX.hs | 3 src/Haddock/Backends/Xhtml/DocMarkup.hs | 1 src/Haddock/Interface/LexParseRn.hs | 1 src/Haddock/Interface/Rename.hs | 7 src/Haddock/InterfaceFile.hs | 6 src/Haddock/Parser.hs | 13 src/Haddock/Types.hs | 3 src/Haddock/Utils.hs | 11 test/Haddock/ParseSpec.hs | 19
It’s time for some images. What use is all this if it doesn’t look pretty in the end? Here’s the result of the efforts. Generating docs for the following code-- | Module : File module File where -- | /SomeType/ data SomeType -- | __Othertype__ data OtherType -- | Here's some __bold__ foo :: SomeType -> SomeType foo = undefined -- | __Multi-word bold__ bar :: OtherType -> OtherType bar = undefined -- | __No multi-line -- bold, no sir__ baz :: [a] -> [a] baz = undefined -- | __Can't escape \\__ the underscores__ qux :: SomeType -> OtherType qux = undefined -- | __Can't even have a single unescaped _ in the string__ quux :: t quux = undefined -- | __No other /markup/ inside either__ corge :: OtherType -> SomeType corge = undefined
Something that I wanted to do for a while and that was further motivated by a recent Trac ticket was to allow markup inside of emphasis (and now bold). There’s also a much less recent ticket about multi-line emphasis (and now bold). Here’s an exclusive preview of both of these features. In fact, even I’m actually rendering the documentation for the first time to inspect with my eyes as I rely on tests otherwise:
Also, something I’m less enthusiastic about, multi-line markup:
See the first ticket I linked to for reasoning. I’m writing a tool on a side to help to determine the effects of various changes on existing documentation but it is not usable it. It’s difficult to reliably extract Haddock comments from thousands of files without actually building the projects. I’m thinking of using haskell-src-exts to help and make this task easier but there’s a problem with this approach as well. Currently I compensate for the lack of such tool with tests.
I might write a post in the future on the progress of this and if and how the problems were solved (or weren’t solved!). A warning system enabled with a flag might be nice in Haddock itself but this is not currently planned.
As a side note, I had problem with my e-mail address from 4th to 14th of August, so if you tried to contact me and didn’t get a reply, please try again.
The slides of my keynote from YOW! Lambda Jam 2013 in May in Brisbane consider a key question: Do Extraterrestrials Use Functional Programming? The talk discusses fundamental aspects of programming as well as means to leverage an understanding of these fundamental aspects to improve our programming practice.
I've just uploaded uniplate-1.6.11, which fixes a severe performance regression introduced by unordered-containers-0.2.3.0 and above. As an example, the time to run HLint on the HLint source code (my standard benchmark) jumped from 1.7 seconds to 18.6 seconds, more than ten times slower. I strongly recommend anyone using Uniplate/HLint to upgrade.
The problem is caused by the lookupDefault function from the Data.HashMap.Strict module:
lookupDefault :: (Eq k, Hashable k) => v -> k -> HashMap k v -> v
lookupDefault def k mp = ...
This function looks up k in mp, but if k is not found, returns def. There has been discussion over whether def should be evaluated even if k is present in mp, and since unordered-containers-0.2.3.0 lookupDefault always evaluates def. There are many legitimate discussions on semantics in the Haskell community, but I do not consider this discussion to be one of them - lookupDefault should not pointlessly evaluate def. As John Hughes elegantly argues in "Why Functional Programming Matters", laziness lets you write functions that composable properly. I have spent several days debugging two problems caused by the excessive strictness in lookupDefault:
Problem 1: Error Defaults
uniplate-1.6.9 contained the following code:
lookupDefault (error "Uniplate internal error: Report to Neil Mitchell, couldn't grab in follower") x mp
Uniplate uses a very complex and highly tuned algorithm to determine which values can be recursively contained by a type. The algorithm relies on subtle invariants, in particular that x must be a member of mp at this particular point. If that fails for certain data types, I want users to submit a bug report, rather than wonder if they called Uniplate incorrectly. The simplest way to achieve that goal is using a default value of error that is only evaluated if x is not present in mp. Unfortunately, by making lookupDefault strict, this error message was always triggered. People have argued that passing error as the default is only to work around the absence of stack traces in GHC - an argument disproven by the example above. The error message provides information that would not be provided by a stack trace - the error is not merely an error, but an error that is my fault. I fixed this bug in uniplate-1.6.10 by switching to:
fromMaybe (error "Uniplate internal error: Report to Neil Mitchell, couldn't grab in follower") $ lookup x mp
Problem 2: Expensive Defaults
uniplate-1.6.10 contained the following code:
lookupDefault (hit ! x) x mp
In this example I look up x in mp, but if that fails, look up x in hit. By forcing the default to be evaluated this code always performs the lookup in hit, resulting in a slowdown of more than 10 times in HLint, and more than 20 times in Uniplate microbenchmarks. I fixed this bug in February by changing all instances of lookupDefault to use fromMaybe after finding out lookupDefault was incorrect, but was unaware of the significant performance impact until earlier today.
There are various arguments for making lookupDefault strict in the default argument:
- "You are using Data.HashMap.Strict" - the Strict module for a data structure should provide a data structure containing strict values, not a module of functions which are unnecessarily strict - values which are never stored in the data structure should never be evaluated. As another example consider the Control.Monad.State.Strict module, which provides a strict state monad. In that module, the state is strict, but functions like return are not needlessly strict.
- "We don't support using undefined values" - I expect all Haskell libraries to work with undefined values where sensible, not be corrupted in the presence of exceptions and be multithread safe. These are some of the attributes that are essential to allow libraries to be used predictably and reused in situations the authors did not consider.
- "Strictness is faster" - a strict lookupDefault may occasionally shave nanoseconds off a program, but can make a program arbitrarily slower, and in the case of HLint makes a commonly used program 10 times slower.
- "It is what people expect" - I didn't expect lookupDefault to be strict, despite being one of a handful of Haskell programmers to program in a strict variant of Haskell all day long.