Creating a code table for a Huffman tree

Submitted by metaperl on Sun, 10/02/2005 - 3:04am.

Our ultimate goal is to have a list in which each element gives the route on a tree to a particular letter. E.g.


means go left from the top to code for a t



means go right then left from the top to code for an a

So, we want to be able to do this:

*Main> codeTable (codes "tab")

and the question is how?

Well, step one is to ask what granularity we need to take the input data on. And the answer is: we can take the input data one at a time.

Questions two and three are: what should our final and transition actions look like? Well, let's start with the final action. The final action for any particular leaf is going to yield something like this:


we can restate that as: the final action as we move through a tree is to take a leaf and return the character there as well as the entire path to it.

Having stated the final action, we see what our transition action must be, we need to build up the entire path to that element as we make the transitions

The final code from SJT's implementation shows these final and transition actions as the solution to our problem:

-- Making a table from a Huffman tree.

codeTable :: Tree -> Table

codeTable = convert []

-- Auxiliary function used in conversion to a table. The first argument is
-- the HCode which codes the path in the tree to the current Node, and so
-- codeTable is initialised with an empty such sequence.

convert :: HCode -> Tree -> Table

-- Action Final
convert cd (Leaf c n) = [(c,cd)]

-- Action Transition
convert cd (Node n t1 t2)
= (convert (cd++[L]) t1) ++ (convert (cd++[R]) t2)

How to think about programming in Haskell

Submitted by metaperl on Sun, 10/02/2005 - 2:54am.

This is a node I intend to expand over time.

use recursion and types to your advantage
This is central to Haskell programming, in fact
it is the most important part of your reasoning process. Getting to know your data type and pulling it apart to handle it case-by-case is the way of Haskell development.

Now that we have established the utility of recursion and strong static typing in building functions, let's use it. Let's see how we can write functions which obey our type rules by looking around for various instances of the output type and asking when they would make sense in the function definition. This might be aseat-of-the-pants approach, but hey, if it works, do it!

input chunking
Earlier in my blog I noted that it is important to realize the granularity of a decision. In that problem, one makes a decision at a point in the string not just on a single character but a chunk of the input string that is the same length as the search string.
final product and transition
When you move over input, your functions relate to one of the aspects of this input - it's initial phase, its intermediate phases or its terminal phase. In recursive functions, the terminal phase is handled by what is known as a base case and this base case relates directly to the final desired output. It must return or represent that directly. The intermediate phase is also important and must be handled properly.

This example of creating a table of codes for a huffman tree shows the concept of final product and transition in action.

can you use primitive recursion?
Primitive recursion is characterized by a base case and an inductive case. For example:

sum [] = 0
sum (x:xs) = x + sum xs

More Haskell and Dylan data

Submitted by metaperl on Sun, 10/02/2005 - 1:09am.

The computer language shootout is a good way to compare two languages. I like the Haskell way of breaking a complex task down into functions.

Overloading functions and haskell

Submitted by metaperl on Sat, 10/01/2005 - 6:54am.

Both Haskell and Dylan have strong track records in the ICFP contests. Dylan is a completely object-oriented language based around Generic Functions. According to Bruce Hoult " can think of it as being a function that can accept arguments of any type, and examines the types of the actual arguments in order to decide what specialized function to call" More extensively The Dylan Book says

Dylan is a dynamic language — everything in Dylan is defined in terms of a dynamic execution model. As we saw in Section 5.5, page 63, the execution model of how a method is chosen when a generic function is called with a particular set of arguments is highly dynamic: the arguments are evaluated; the types of the arguments are determined; the applicable methods are found and sorted according to specificity; and, finally, the most specific, applicable method is called. This model implies that values and types can change, and that methods can be added right up until the generic function is called, and any of these changes still have an effect on which method is ultimately chosen. This dynamism — the model that value, number, and type of arguments; return values; applicable method; and method choice and execution are all determined at the last possible moment — is what gives the Dylan language its power.

This research paper by Simon Peyton-Jones shows how pure Haskell lacks this power.

Haskell is a very minimalist language consisting of referentially transparent (context-insensitive) strongly-typed functions and hardly anything else. It is a robust set of building blocks that requires clear thought and the ability to concisely specify a problem without a bunch of hemming and hawing. When you lack state and plumb infinite data structures at will, you must be willing to rely on your ability to describe problems rather than hand-hold them, and Haskell forces this degree of conceptualization.

On the other hand, Dylan (DYnamic LANguage) is a powerful language. again from the Dylan book we have the following. Substitute the word "Haskell" everywhere you see "C" below. And substitute "function application" for "primitive machine operations"

Dylan is a powerful language: Many of the built-in, or primitive, language operations are high-level operations, such as the method-dispatch mechanism, the collection facility, and the exception mechanism. Because of Dylan's powerful features, it can be hard for the programmer to develop an efficiency model — a model of the absolute or relative cost of different approaches to a problem.

In contrast, in other languages, such as C, every language construct can be explained directly in terms of a small number of machine instructions. Although it may be easy to understand the performance of a C program in terms of a simple model, programming in C is more work for the programmer — the higher-level abstractions are not provided, and must often be built from scratch.

For example, a C programmer expects that the run-time cost of calling a function is the cost of possibly saving registers on a stack, passing the arguments, executing a machine instruction for jumping to a subroutine, and then executing a return instruction at the end of the function; if it is a call through a function pointer, or a C++ virtual function, the cost of an indirect jump must be added. In Dylan, the story is more complicated, because Dylan has a more sophisticated execution model: A call to a generic function might be much more expensive in a dynamic situation, because computing the most specific method could take much longer than would execution of the method itself.

Did Donald Knuth Choose the Wrong Language??

Submitted by metaperl on Wed, 09/28/2005 - 11:50pm.

Donald Knuth's Art of Computer Programming Series uses his own procedural assembly languaged called mix. Now, the question becomes: should Knuth have used a functional language to describe his algorihtms? Should TeX have been written in a functional language?

Computers have a procedural architecture. Do the roots of computation imply anything about the best branches?

Autrijus on Fire - What is Haskell?

Submitted by metaperl on Wed, 09/14/2005 - 9:36pm.

In an interview with Autrijus Tang, we hear Autriju making several strong statements in support of Haskell:


Haskell is faster than C++, more concise than Perl, more regular than Python, more flexible than Ruby, more typeful than C#, more robust than Java, and has absolutely nothing in common with PHP.


Most languages require you to pay a "language tax": code that does nothing with the main algorithm, placed there only to make the computer happy.

Thus there is a threshold of refactoring: if introducing a new class or a new function costs five lines, programmers are encouraged to copy-and-paste four lines over and over again instead of abstracting them out, even with help from a good refactoring browser.

On the other end of spectrum, we often shy away from abstracting huge legacy code because we are afraid of breaking the complex interplay of flow control and global and mutable variables. Besides, the paths leading to common targets of refactoring--those Design Patterns--are often non-obvious.

Because Haskell makes all side effects explicit, code can be refactored in a safe and automatic way. Indeed, you can ask a bot on #haskell to turn programs to its most abstracted form for you. Haskell also encourages you to shape the language to fit your problem domain, so the program often reads out like a runnable specification.

a series of unit conversion calculations.. not sure how to do in Haskell

Submitted by metaperl on Fri, 09/09/2005 - 12:56pm.

-- 2 oranges make 8 ounces of juice
ounces 8 = Oranges 2

-- a 32.oz non-organic orange juice costs 4.55 at Jamba Juice
ounces 32 = Cost 4.55

-- at retail price, I get 72 organic oranges for 10.00
-- which means 7.2 oranges for 1.00
-- which means 15 cents per orange
-- which means 30 cents for 8 ounces
-- which means 1.20 for 32 ounces
-- which means 3.35 cheaper than Jamba Juice and organic produce
-- I can get much cheaper prices on oranges if I buy in bulk

dealing out 5 hands of cards, each containing 5 cards

Submitted by metaperl on Sun, 09/04/2005 - 10:22pm.

I helped someone out in #haskell with the problem stated in the subject by writing the code below. I'm a bit rusty (and never was very good) at Haskell, so any pointers on how to improve the code I supplied to him are appreciated.

cards = [1..52]

dropAmount = [ d*5 | d <- [0 .. 4 ] ]

hand cards = map (\d -> (take 5 $ drop d cards)) dropAmount

What advantage does Haskell have over a "typed" language like Java or C++

Submitted by metaperl on Fri, 09/02/2005 - 7:01am.

I was in a job interview for a Perl position and I volunteered some information about learning Haskell in my freetime since it has improved my Perl programming so much.

Then my boss asked me this: "what advantage does programming in Haskell have over Java or C++. They have strong typing don't they"

And my answer was: "they don't have type inferencing" --- but is not advantage, only something different it has. What would you say are the advantages of Haskell over Java or C++?

is darcs being maintained

Submitted by metaperl on Mon, 08/29/2005 - 3:31am.

I noticed some discussion on #haskell about darcs needing a new maintainer. Should I look at other options for version control that are better maintained and supported or is this just rumours about darcs? I hope they are.