analyzing the stringification of a tree data structure

Submitted by metaperl on Tue, 04/11/2006 - 11:24am.

In section 8.3 of the discussion on stringifying tree structures Hudak says:

Because (++) has time complexity linear in the length of its left argument, showTree is potentially quadratic in the size of the tree.

in response to this code:


showTree (Leaf x) = show x
showTree (Branch l r) = "<" ++ showTree l ++ "|" ++ showTree r ++ ">"

So this brings up two questions:

  1. Why does (++) have time complexity linear in the length of its left argument?
  2. Why is showTree potentially quadratic in the size of the tree?
Submitted by Cale Gibbard on Wed, 04/12/2006 - 3:23pm.

Consider the Prelude definition of (++):

[]     ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

Let's try applying it to two lists, say [1,2,3] and ys.

[1,2,3] ++ ys
= 1 : ([2,3] ++ ys)
= 1 : (2 : ([3] ++ ys))
= 1 : (2 : (3 : ([] ++ ys)))
= 1 : 2 : 3 : ys

and we're done. In general, it will take n+1 reduction steps to completely finish applying (++) where the left list is of length n: one for each of the cons cells, and one for the nil at the end. It doesn't do anything to the right hand list, so that list has no effect on the length of time (++) takes, but it does have to construct n new cons cells, which is where the time (and potentially space) goes.

Somehow I think your definition of showTree is wrong. That version of showTree is constant time in the size of the tree, but doesn't do much either. I assume that the code he's referring to is:

showTree (Leaf x) = show x
showTree (Branch l r) =
   "Branch (" ++ showTree l ++ ") (" ++ showTree r ++ ")"

or something along those lines. The reason that this might be slow is that you might have a tree which is very unbalanced to the left. Consider what happens when showing the tree:
Branch (Branch (Branch (Branch (Leaf 1) (Leaf 2)) (Leaf 3)) (Leaf 4)) (Leaf 5)

You'll be adding longer and longer strings on the left to shorter ones on the right. Assuming that the leaves show to strings of about the same length, the time taken will be proportional to 1 + 2 + ... + n = (1/2) (n^2 + n), where n is the number of nodes of the tree. Hence the quadratic complexity. If you don't believe me, break out the definitions of showTree and (++) and try doing it by hand, and unless you're pretty persistent, you'll get bored quickly, and see why it's not such a great strategy for printing the tree. :)

On the other hand, the laziness of Haskell means that (++) quite often isn't as bad as this would suggest. It's only really bad when you're repeatedly adding lots of small strings on the right, and you only pay the whole penalty for using the whole list. If you only use the first k items, you can only expect to have O(kn) time, where n is the number of (++)'s, and single (++)'s are usually not very costly at all.

- Cale

Submitted by Derek Elkins on Thu, 04/13/2006 - 1:42pm.

HaskellNewbie/AppendingToLists covers this is quite a bit of detail.

[...] and single (++)'s are usually not very costly at all.

This is misleading. As long as you are using cons lists (++) is the ideal implementation of a single (persistent) append. If it is too costly then a (singly linked-)list is an inappropriate data structure for the task.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.