# News aggregator

### Get expected type of expression using GHC API?

### Yesod Web Framework: Lens generator in Persistent

I've just released version 1.3.1 of persistent-template, which adds a new options: mpsGenerateLenses. When enabled, this option will cause lenses to be generated instead of normal field accessors. An example is worth a thousand words:

{-# LANGUAGE EmptyDataDecls #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE QuasiQuotes #-} {-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE TypeFamilies #-} import Control.Monad.IO.Class (liftIO) import Database.Persist import Database.Persist.Sqlite import Database.Persist.TH import Control.Lens share [ mkPersist sqlSettings { mpsGenerateLenses = True } , mkMigrate "migrateAll" ] [persistLowerCase| Person name String age Int Maybe deriving Show |] main :: IO () main = runSqlite ":memory:" $ do runMigration migrateAll johnId <- insert $ Person "John Doe" $ Just 35 Just john <- get johnId liftIO $ putStrLn $ john ^. personName janeId <- insert $ john & personName .~ "Jane Joe" Just jane <- get janeId liftIO $ print janeNot a major feature, but convenient. It can also be combined with the relatively new mpsPrefixFields option if your field names are unique, e.g.:

share [ mkPersist sqlSettings { mpsGenerateLenses = True, mpsPrefixFields = False } , mkMigrate "migrateAll" ] [persistLowerCase| Person name String age Int Maybe deriving Show |] main :: IO () main = runSqlite ":memory:" $ do runMigration migrateAll johnId <- insert $ Person "John Doe" $ Just 35 Just john <- get johnId liftIO $ putStrLn $ john ^. name janeId <- insert $ john & name .~ "Jane Joe" Just jane <- get janeId liftIO $ print janeNote that this isn't the first use of lenses in Persistent. While this implementation is fully compatible with the lens package, it introduces no dependency on that package, due to the beautiful way in which lenses work.

### Jeremy Gibbons: Upwards and downwards accumulations

Continuing my work in regress, this post revisits—with the benefit of much hindsight—what I was working on for my DPhil thesis (which was summarized in a paper at MPC 1992) and in subsequent papers at MPC 1998 and in SCP in 2000. This is the topic of *accumulations* on data structures, which distribute information across the data structure. List instances are familiar from the Haskell standard libraries (and, to those with a long memory, from APL); my thesis presented instances for a variety of tree datatypes; and the later work was about making it datatype-generic. I now have a much better way of doing it, using Conor McBride’s *derivatives*.

*Accumulations* or *scans* distribute information contained in a data structure across that data structure in a given direction. The paradigmatic example is computing the running totals of a list of numbers, which can be thought of as distributing the numbers rightwards across the list, summing them as you go. In Haskell, this is an instance of the operator:

A special case of this pattern is to distribute the elements of a list rightwards across the list, simply collecting them as you go, rather than summing them. That’s the function, and it too is an instance of :

It’s particularly special, in the sense that it is the most basic ; any other instance can be expressed in terms of it:

This is called the *Scan Lemma* for . Roughly speaking, it states that a replaces every node of a list with a applied to that node’s predecessors. Read from right to left, the scan lemma is an efficiency-improving transformation, eliminating duplicate computations; but note that this only works on expressions where is a , because only then are there duplicate computations to eliminate. It’s an important result, because it relates a clear and simple specification on the right to a more efficient implementation on the left.

However, the left-to-right operators , , and are a little awkward in Haskell, because they go against the grain of the cons-based (ie, right-to-left) structure of lists. I leave as a simple exercise for the reader the task of writing the more natural , , and , and identifying the relationships between them. Conversely, one can view etc as the natural operators for snoc-based lists, which are constructed from nil and snoc rather than from nil and cons.

Upwards and downwards accumulations on binary treesWhat would , , , etc look like on different—and in particular, non-linear—datatypes? Let’s consider a simple instance, for homogeneous binary trees; that is, trees with a label at both internal and external nodes.

for which the obvious fold operator is

I’m taking the view that the appropriate generalization is to distribute data “upwards” and “downwards” through such a tree—from the leaves towards the root, and vice versa. This does indeed specialize to the definitions we had on lists when you view them vertically in terms of their “cons” structure: they’re long thin trees, in which every parent has exactly one child. (An alternative view would be to look at distributing data horizontally through a tree, from left to right and vice versa. Perhaps I’ll come back to that another time.)

The upwards direction is the easier one to deal with. An upwards accumulation labels every node of the tree with some function of its *descendants*; moreover, the descendants of a node themselves form a tree, so can be easily represented, and folded. So we can quite straightforwardly define:

where yields the root of a tree:

As with lists, the most basic upwards scan uses the constructors themselves as arguments:

and any other scan can be expressed, albeit less efficiently, in terms of this:

The downwards direction is more difficult, though. A downwards accumulation should label every node with some function of its *ancestors*; but these do not form another tree. For example, in the homogeneous binary tree

the ancestors of the node labelled are the nodes labelled . One could represent those ancestors simply as a list, ; but that rules out the possibility of a downwards accumulation treating left children differently from right children, which is essential in a number of algorithms (such as the parallel prefix and tree drawing algorithms in my thesis). A more faithful rendering is to define a new datatype of *paths* that captures the left and right turns—a kind of non-empty cons list, but with both a “left cons” and a “right cons” constructor:

(I called them “threads” in my thesis.) Then we can capture the data structure representing the ancestors of the node labelled

by the expression . I leave it as an exercise for the more energetic reader to work out a definition for

to compute the tree giving the ancestors of every node, and for a corresponding .

Generic upwards accumulations
Having seen ad-hoc constructions for a particular kind of binary tree, we should consider what the datatype-generic construction looks like. I discussed datatype-generic upwards accumulations already, in the post on Horner’s Rule; the construction was given in the paper Generic functional programming with types and relations by Richard Bird, Oege de Moor and Paul Hoogendijk. As with homogeneous binary trees, it’s still the case that the generic version of labels every node of a data structure of type with the descendants of that node, and still the case that the descendants form a data structure also of type . However, in general, the datatype does not allow for a label at every node, so we need the *labelled variant* where . Then we can define

where returns the root label of a labelled data structure—by construction, every labelled data structure has a root label—and is the unique arrow to the unit type. Moreover, we get a datatype-generic operator, and a Scan Lemma:

Generic downwards accumulations, via linearization

The best part of a decade after my thesis work, inspired by the paper by Richard Bird & co, I set out to try to define datatype-generic versions of downward accumulations too. I wrote a paper about it for MPC 1998, and then came up with a new construction for the journal version of that paper in SCP in 2000. I now think these constructions are rather clunky, and I have a better one; if you don’t care to explore the culs-de-sac, skip this section and the next and go straight to the section on derivatives.

The MPC construction was based around a datatype-generic version of the datatype above, to represent the “ancestors” of a node in an inductive datatype. The tricky bit is that data structures in general are non-linear—a node may have many children—whereas paths are linear—every node has exactly one child, except the last which has none; how can we define a “linear version” of ? Technically, we might say that a functor is linear (actually, “affine” would be a better word) if it distributes over sum.

The construction in the paper assumed that was a sum of products of literals

where each is either , , or some constant type such as or . For example, for leaf-labelled binary trees

the shape functor is , so (there are two variants), (the first variant has a single literal, ) and (the second variant has two literals, and ), and:

Then for each we define a -ary functor , where is the “degree of branching” of variant (ie, the number of s occurring in , which is the number of for which ), in such a way that

and is linear in each argument except perhaps the first. It’s a bit messy explicitly to give a construction for , but roughly speaking,

where is “the next unused ” when , and just otherwise. For example, for leaf-labelled binary trees, we have:

Having defined the linear variant of , we can construct the datatype of paths, as the inductive datatype of shape where

That is, paths are a kind of non-empty cons list. The path ends at some node of the original data structure; so the last element of the path is of type , which records the “local content” of a node (its shape and labels, but without any of its children). Every other element of the path consists of the local content of a node together with an indication of which direction to go next; this amounts to the choice of a variant , followed by the choice of one of identical copies of the local contents of variant , where is the degree of branching of variant . We model this as a base constructor and a family of “cons” constructors for and .

For example, for leaf-labelled binary trees, the “local content” for the last element of the path is either a single label (for tips) or void (for bins), and for the other path elements, there are zero copies of the local content for a tip (because a tip has zero children), and two copies of the void local information for bins (because a bin has two children). Therefore, the path datatype for such trees is

which is isomorphic to the definition that you might have written yourself:

For homogeneous binary trees, the construction gives

which is almost the ad-hoc definition we had two sections ago, except that it distinguishes singleton paths that terminate at an external node from those that terminate at an internal one.

Now, analogous to the function which labels every node with its descendants, we can define a function to label every node with its ancestors, in the form of the path to that node. One definition is as a fold; informally, at each stage we construct a singleton path to the root, and map the appropriate “cons” over the paths to each node in each of the children (see the paper for a concrete definition). This is inefficient, because of the repeated maps; it’s analogous to defining by

A second definition is as an unfold, maintaining as an accumulating parameter of type the “path so far”; this avoids the maps, but it is still quadratic because there are no common subexpressions among the various paths. (This is analogous to an accumulating-parameter definition of :

Even with an accumulating “Hughes list” parameter, it still takes quadratic time.)

The downwards accumulation itself is defined as a path fold mapped over the paths, giving a Scan Lemma for downwards accumulations. With either the fold or the unfold definition of paths, this is still quadratic, again because of the lack of common subexpressions in a result of quadratic size. However, in some circumstances the path fold can be reassociated (analogous to turning a into a ), leading finally to a linear-time computation; see the paper for the details of how.

Generic downwards accumulations, via zipI was dissatisfied with the “…”s in the MPC construction of datatype-generic paths, but couldn’t see a good way of avoiding them. So in the subsequent SCP version of the paper, I presented an alternative construction of downwards accumulations, which does not go via a definition of paths; instead, it goes directly to the accumulation itself.

As with the efficient version of the MPC construction, it is coinductive, and uses an accumulating parameter to carry in to each node the seed from higher up in the tree; so the downwards accumulation is of type . It is defined as an unfold, with a body of type

The result of applying the body will be constructed from two components, of types and : the first gives the root label of the accumulation and the seeds for processing the children, and the second gives the children themselves.

These two components get combined to make the whole result via a function

This will be partial in general, defined only for pairs of -structures of the same shape.

The second component of is the easier to define; given input , it unpacks the to , and discards the and the (recall that is the labelled variant of , where ).

For the first component, we enforce the constraint that all output labels are dependent only on their ancestors by unpacking the and pruning off the children, giving input . We then suppose as a parameter to the accumulation a function of type to complete the construction of the first component. In order that the two components can be zipped together, we require that is shape-preserving in its second argument:

where is the unique function to the unit type. Then, although the built from these two components depends on the partial function , it will still itself be total.

The SCP construction gets rid of the “…”s in the MPC construction. It is also inherently efficient, in the sense that if the core operation takes constant time then the whole accumulation takes linear time. However, use of the partial function to define a total accumulation is a bit unsatisfactory, taking us outside the domain of sets and total functions. Moreover, there’s now only half an explanation in terms of paths: accumulations in which the label attached to each node depends only on the *list* of its ancestors, and not on the left-to-right ordering of siblings, can be factored into a list function (in fact, a ) mapped over the “paths”, which is now a tree of lists; but accumulations in which left children are treated differently from right children, such as the parallel prefix and tree drawing algorithms mentioned earlier, can not.

After another interlude of about a decade, and with the benefit of new results to exploit, I had a “eureka” moment: the linearization of a shape functor is closely related to the beautiful notion of the *derivative* of a datatype, as promoted by Conor McBride. The crucial observation Conor made is that the “one-hole contexts” of a datatype—that is, for a container datatype, the datatype of data structures with precisely one element missing—can be neatly formalized using an analogue of the rules of differential calculus. The one-hole contexts are precisely what you need to identify which particular child you’re talking about out of a collection of children. (If you’re going to follow along with some coding, I recommend that you also read Conor’s paper Clowns to the left of me, jokers to the right. This gives the more general construction of *dissecting* a datatype, identifying a unique hole, but also allowing the “clowns” to the left of the hole to have a different type from the “jokers” to the right. I think the explanation of the relationship with the differential calculus is much better explained here; the original notion of derivative can be retrieved by specializing the clowns and jokers to the same type.)

The essence of the construction is the notion of a *derivative* of a functor . For our purposes, we want the derivative in the second argument only of a bifunctor; informally, is like , but with precisely one missing. Given such a one-hole context, and an element with which to plug the hole, one can reconstruct the whole structure:

That’s how to consume one-hole contexts; how can we produce them? We could envisage some kind of inverse of , which breaks an -structure into an element and a context; but this requires us to invent a language for specifying which particular element we mean— is not injective, so needs an extra argument. A simpler approach is to provide an operator that annotates every position at once with the one-hole context for that position:

One property of is that it really is an annotation—if you throw away the annotations, you get back what you started with:

A second property relates it to —each of elements in a hole position plugs into its associated one-hole context to yield the same whole structure back again:

(I believe that those two properties completely determine and .)

Incidentally, the derivative of a bifunctor can be elegantly represented as an *associated type synonym* in Haskell, in a type class of bifunctors differentiable in their second argument, along with and :

Conor’s papers show how to define instances of for all polynomial functors —anything made out of constants, projections, sums, and products.

The path to a node in a data structure is simply a list of one-hole contexts—let’s say, innermost context first, although it doesn’t make much difference—but with all the data off the path (that is, the other children) stripped away:

This is a projection of Huet’s zipper, which preserves the off-path children, and records also the subtree in focus at the end of the path:

Since the contexts are listed innermost-first in the path, closing up a zipper to reconstruct a tree is a over the path:

Now, let’s develop the function , which turns a tree into a labelled tree of paths. We will write it with an accumulating parameter, representing the “path so far”:

Given the components of a tree and a path to its root, must construct the corresponding labelled tree of paths. Since and , this amounts to constructing a value of type . For the first component of this pair we will use , the path so far. The second component can be constructed from by identifying all children via , discarding some information with judicious s, consing each one-hole context onto to make a longer path, then making recursive calls on each child:

That is,

Downwards accumulations are then path functions mapped over the result of . However, we restrict ourselves to path functions that are instances of , because only then are there common subexpressions to be shared between a parent and its children (remember that paths are innermost-first, so related nodes share a tail of their ancestors).

Moreover, it is straightforward to fuse the with , to obtain

which takes time linear in the size of the tree, assuming that and take constant time.

Finally, in the case that the function being mapped over the paths is a as well as a , then we can apply the Third Homomorphism Theorem to conclude that it is also an associative fold over lists. From this (I believe) we get a very efficient parallel algorithm for computing the accumulation, taking time logarithmic in the size of the tree—even if the tree has greater than logarithmic depth.

### The Haskell 98 Language Report

### The Haskell 98 Language Report

### Arrows: bibliography

### Arrows: bibliography

### www.infosun.fim.uni-passau.de

### www.infosun.fim.uni-passau.de

### Haskell Cheatsheet

### Haskell Cheatsheet

### Ian Ross: Haskell FFT 9: Prime-Length FFT With Rader's Algorithm

[The code from this article is available as a Gist]

In the last article in this series, benchmarking results gave a pretty good indication that we needed to do *something* about prime-length FFTs if we want to get good scaling for all input sizes. Falling back on the O(N2) DFT algorithm just isn’t good enough.

In this article, we’re going to look at an implementation of Rader’s FFT algorithm for prime-length inputs. Some aspects of this algorithm are not completely straightforward to understand since they rely on a couple of results from number theory and group theory. We’ll try to use small examples to motivate what’s going on.

Before we begin exploring Rader’s algorithm, it’s probably worth taking a minute to look at why we might expect there to be a better algorithm for prime-length FFTs than the naïve O(N2) DFT algorithm we’ve been using so far. Throughout this article, we’re going to use F5 and F7 as running examples (you’ll see why later). Here’s F5:

F5=(111111ω51ω52ω53ω541ω52ω54ω56ω581ω53ω56ω59ω5121ω54ω58ω512ω516)=(111111ω51ω52ω53ω541ω52ω54ω51ω531ω53ω51ω54ω521ω54ω53ω52ω51)

and here’s F7:

F7=(11111111ω71ω72ω73ω74ω75ω761ω72ω74ω76ω78ω710ω7121ω73ω76ω79ω712ω715ω7181ω74ω78ω712ω716ω720ω7241ω75ω710ω715ω720ω725ω7301ω76ω712ω718ω724ω730ω736)=(11111111ω71ω72ω73ω74ω75ω761ω72ω74ω76ω71ω73ω751ω73ω76ω72ω75ω71ω741ω74ω78ω75ω72ω76ω731ω75ω73ω71ω76ω74ω721ω76ω75ω74ω73ω72ω71)

The main thing to notice is that if you look at the sub-matrix that excludes the top row and left column (both of which have entries that are all ones) from F5 and F7, you’ll see a matrix each of whose rows and columns is a permutation of the values ωN1,ωN2,…,ωNN−1. This is because each row of FN is of the form (ωN0k,ωN1k,ωN2k,…,ωN(N−1)k) for 1≤k≤N−1. For prime N, none of these k values divides N exactly, so that there are no “repeats” in the powers of ωN.

So, there’s definitely something a little bit special about FN for prime N. To make effective use of this to produce a prime-length FFT requires some non-obvious insight.

Throughout the remainder of this article, we’ll use p to denote some prime number, just to avoid repeatedly writing “for some prime N”.

Rader’s algorithmThe basic idea of Rader’s algorithm is to make use of the special permutation structure of the powers of ωp in Fp to write the DFT summation

Hn=∑k=0N−1hke2πikn/Nn=0,1,2,…,N−1.(1)

as a *convolution*, which can then be calculated via FFTs of non-prime lengths. We’ll step through how this works, leaving a couple of details aside until we come to the implementation.

The first thing we do is to split out the special zero-index parts of the Fourier matrix calculation, so that (1) becomes

H0=∑k=0p−1hpHn=h0+∑k=1p−1hkωpnkn=1,2,…,p−1.(2)

We thus need to work out an efficient way of calculating the sum in the second expression.

Multiplicative group modulo nThe most common approach to thinking of a group structure for integers modulo some number n is to use addition as the group operation, giving the group ℤ/n with group elements 0,1,2,…,n−1. A less common approach is to think of the integers in 1,2,…,n−1 that are prime relative to n with the group operation being *multiplication* modulo n. This group is denoted by a range of different notations, but we’ll call it ℤn×. For prime n, all of the numbers 1,2,…,n−1 are elements of ℤn×. For example, for n=5 and n=7, we have the following group multiplication tables:

**1**

**2**

**3**

**4**

**1**1 2 3 4

**2**2 4 1 3

**3**3 1 4 2

**4**4 3 2 1

**1**

**2**

**3**

**4**

**5**

**6**

**1**1 2 3 4 5 6

**2**2 4 6 1 3 5

**3**3 6 2 5 1 4

**4**4 1 5 2 6 3

**5**5 3 1 6 4 2

**6**6 5 4 3 2 1

This group turns out to be the key to writing the sum in (2) as a convolution, since both n and k range over the values 1,2,…,p−1 and the multiplication in the power of ωp is of necessity calculated modulo p (since ωpp=1).

Group generatorsThe group ℤp× is *cyclic*, which means that any element of the group a can be written as an integer power of a single element of the group, the group *generator* g, i.e. a=gi for some unique positive integer i with 0≤i≤p−2, and equivalently a=g−j for some unique positive integer j with 0≤j≤p−2, where negative powers of g denote powers of the multiplicative inverse (modulo p) of g. For example, for ℤ5×, either 2 or 3 is a generator:

20=1=1mod521=2=2mod522=4=4mod523=8=3mod52−0=1=1mod52−1=3=3mod52−2=9=4mod52−3=27=2mod530=1=1mod531=3=3mod532=9=4mod533=27=2mod53−0=1=1mod53−1=2=2mod53−2=4=4mod53−3=8=3mod5

In this case, 2=3−1 and 3=2−1. For ℤ7×, a suitable generator is 3, and

30=1=1mod731=3=3mod732=9=2mod733=27=6mod734=81=4mod735=243=5mod73−0=1=1mod73−1=5=5mod73−2=25=4mod73−3=125=6mod73−4=625=2mod73−5=3125=3mod7

In this case, 5=3−1.

We’ll talk about how we determine the group generator g later. For the moment, assume that we know a suitable generator.

Representation as convolutionGiven a generator g for the group ℤp×, we can write the second equation in (2) as

Hg−r=h0+∑q=0p−2hgqωpgq−r=h0+∑q=0p−2hgqωpg−(r−q)r=0,1,…,p−2.(3)

(This relies on the fact that hk and Hk are both periodic in p and ωpp=1.)

For example, if p=5, taking g=2, we have:

r=0,g−r=1⇒H1=h0+∑q=0p−2hgqω5gq=h0+h1ω5+h2ω52+h4ω54+h3ω53r=1,g−r=3⇒H3=h0+∑q=0p−2hgqω5gq−1=h0+h1ω53+h2ω51+h4ω52+h3ω54r=2,g−r=4⇒H4=h0+∑q=0p−2hgqω5gq−2=h0+h1ω54+h2ω53+h4ω51+h3ω52r=3,g−r=2⇒H2=h0+∑q=0p−2hgqω5gq−3=h0+h1ω52+h2ω54+h4ω53+h3ω51

and comparison of the right hand sides of each of these equations with the rows of F5 shows that this generator-based approach replicates the rows of the Fourier matrix, albeit in a different order to the “normal” order.

The summation in the expression (3) is in the form of a cyclic convolution of the two sequences aq=hgq and bq=ωpg−q, both of length p−1 with 0≤q≤p−2. For example, for p=5, taking g=2:

q 0 1 2 3 aq h1 h2 h4 h3 bq ω51 ω53 ω54 ω52and for p=7 with g=3:

q 0 1 2 3 4 5 aq h1 h3 h2 h6 h4 h5 bq ω71 ω75 ω74 ω76 ω72 ω73 Calculation of convolution using FFTRecall that, for a continuous convolution of the form

(f*g)(t)=∫−∞∞f(τ)g(t−τ)dt,

we have the convolution theorem:

ℱ(f*g)=ℱ(f)⋅ℱ(g),

where we denote the Fourier transform of a function f by ℱ(f). A comparable result applies for the discrete cyclic convolution in (3). Let us denote the sequence aq defined above as h~=(hgq) and the sequence bq as ω~p=(ωpg−q), both for 0≤q≤p−2, and let us write H~=(Hg−q) for the same range of q to represent the results of the convolution as a sequence. Then:

H~−h0=DFT−1[DFT[h~]⋅DFT[ω~p]]

The discrete Fourier transform of the prime-length input can thus be calculated by:

- Reordering the input vector according to the indexing scheme for the sequence h~;
- Calculating the DFT of h~;
- Multiplying DFT[h~] pointwise by the DFT of the reordered sequence of powers of ωp represented by the sequence ω~p;
- Calculating the inverse DFT of this product;
- Reordering the result according to the indexing scheme for the sequence H~ and adding in the DC offset h0.

The convolution derived above is of length p−1, which, since p is prime, must be composite. We can thus calculate the length p−1 discrete Fourier transforms required by the above approach using our Cooley-Tukey divide and conquer algorithm. The index reorderings and the DFT of the sequence of powers of ωp can all be determined in advance since they depend only on p.

For p=5, taking g=2, this approach looks like this:

- Reorder the input vector to give h~=(h1,h2,h4,h3).
- Calculate DFT[h~] using a conventional FFT algorithm; in this case, this is efficient because the length of h~ is a power of two.
- Multiply DFT[h~] pointwise by DFT[ω~5], which can be pre-computed: ω~5=(ω51,ω53,ω54,ω52).
- Calculate the inverse DFT of the pointwise product to give H~ – this is again an efficient calculation because the input length is a power of two.
- Add the DC offset and extract the final results from H~=(H1,H3,H4,H2).

For p=5, p−1 is a power of two, which means that the FFT calculations needed in the Rader algorithm are as efficient as possible. However, if p−1 itself has large prime factors, it may prove necessary to apply Rader’s algorithm recursively, which will be much less efficient than a direct recursive FFT.

An alternative (and better) approach is provided by zero-padding the sequences involved in the convolution – by padding to a length that is a power of two, recursive application of Rader’s algorithm is avoided. To do this, let us write the two sequences to be convolved as f[i] and g[i], both of length M−1 (we write M−1 here to match the convolution length p−1 in the FFT calculation, and we use the square bracket notation for indexing to make some of the expressions below less ambiguous). The cyclic convolution of these two sequences is

(f*g)[n]=∑m=0M−1f[m]g[n−m]

where all indexes on the sequences f[i] and g[i] are zero-based and are taken modulo M.

We produce new sequences f′[j] and g′[j] of length M′ where M′≥2M−3. This condition on M′ is required to avoid “interference” between unrelated elements of the original sequences due to the wrap-around of the cyclic convolution that we’re going to compute. In our FFT application, we will choose M′ to be a power of two, but any value works as long as it is large enough to satisfy this condition. The sequence f′[j] is constructed by inserting M′−M zeroes between the zeroth and first element of f[i]. Sequence g′[j] is constructed by cyclically repeating the values of g[i] to give a sequence of total length M′. If we now convolve the sequences f′[j] and g′[j], we find that the result contains the convolution of the original sequences f[i] and g[i] as its first M−1 elements.

A minimal example gives a feeling for why this works. Suppose that we set M=4 (so the sequences are of length 3) and consider

f[i]=(1,2,3)andg[i]=(a,b,c).

The cyclic convolution of these sequences is found as:

(f*g)[0]=f[0]g[0−0]+f[1]g[0−1]+f[2]g[0−2]=1a+2c+3b,(f*g)[1]=f[0]g[1−0]+f[1]g[1−1]+f[2]g[1−2]=1b+2a+3c,(f*g)[2]=f[0]g[2−0]+f[1]g[2−1]+f[2]g[2−2]=1c+2b+3a.

The condition on M′ is that M′≥2M−3=5. Putting M′=5, the new sequences are then

f′[j]=(1,0,0,2,3)andg′[j]=(a,b,c,a,b).

and the first three elements of the cyclic convolution of these new sequences are:

(f′*g′)[0]=f′[0]g′[0−0]+f′[1]g′[0−1]+f′[2]g′[0−2]+f′[3]g′[0−3]+f′[4]g′[0−4]=1a+0b+0a+2c+3b=1a+2c+3b,(f′*g′)[1]=f′[0]g′[1−0]+f′[1]g′[1−1]+f′[2]g′[1−2]+f′[3]g′[1−3]+f′[4]g′[1−4]=1b+0a+0b+2a+3c=1b+2a+3c,(f′*g′)[2]=f′[0]g′[2−0]+f′[1]g′[2−1]+f′[2]g′[2−2]+f′[3]g′[2−3]+f′[4]g′[2−4]=1c+0b+0a+2b+3a=1c+2b+3a.

We see that these are identical to the values found from the original sequences f[i] and g[i].

To make use of this result in Rader’s algorithm, we just choose the smallest power of two that satisfies the length condition on the sequences to be convolved, pad the input sequence f[i] with zeroes and convolve with the repeated ωp sequence. (Most of the details can be pre-computed as part of the FFT “planning” phase.)

Implementating Rader’s algorithmIn this section, we’ll look at a basic Haskell implementation of Rader’s algorithm. The implementation here is structured more to be understandable than to be particularly efficient. We’ll reorganise the code to pre-compute some things for efficiency when we start optimising the overall FFT code later on.

After describing the details of the main algorithm, we’ll look at the approach used for the calculation of primitive roots of p to determine generators of the group ℤp×, since this requires a little explanation. Finally, we’ll look at some test results. Here’s the code for our Haskell implementation (we’ll refer to the line numbers in what follows):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 -- | FFT and inverse FFT drivers for Rader's algorithm. raderFFT, raderIFFT :: VCD -> VCD raderFFT = raderFFT' 1 1 raderIFFT v = raderFFT' (-1) (1.0 / (fromIntegral $ length v)) v -- | Rader's algorithm for prime-length complex-to-complex Fast -- Fourier Transform. raderFFT' :: Int -> Double -> VCD -> VCD raderFFT' sign scale xs | isPrime p = map ((scale :+ 0) *) $ generate p $ \idx -> case idx of 0 -> sum xs _ -> xs ! 0 + convmap M.! idx | otherwise = error "non-prime length in raderFFT" where p = length xs p1 = p - 1 -- ^ Convolution length. p1pad = if p1 == 2^(log2 p1) then p1 else 2 ^ (1 + log2 (2 * p1 - 3)) -- ^ Convolution length padded to next greater power of two. g = primitiveRoot p -- ^ Group generator. ig = invModN p g -- ^ Inverse group generator. as = backpermute xs $ iterateN p1 (\n -> (g * n) `mod` p) 1 -- ^ Input values permuted according to group generator -- indexing. pad = p1pad - p1 -- ^ Padding size. apad = generate p1pad $ \idx -> if idx == 0 then as ! 0 else if idx > pad then as ! (idx - pad) else 0 -- ^ Permuted input vector padded to next greater power of two -- size for fast convolution. iidxs = iterateN p1 (\n -> (ig * n) `mod` p) 1 -- ^ Index vector based on inverse group generator ordering. w = omega p bs = backpermute (map ((w ^^) . (sign *)) $ enumFromTo 0 p1) iidxs -- ^ Root of unity powers based on inverse group generator -- indexing. bpad = generate p1pad (\idx -> bs ! (idx `mod` p1)) -- ^ Root of unity powers cyclically repeated to make vector -- of next power of two length for fast convolution. conv = ifft $ zipWith (*) (fft apad) (fft bpad) -- ^ FFT-based convolution calculation. convmap = M.fromList $ toList $ zip iidxs conv -- ^ Map constructed to enable inversion of inverse group -- generator index ordering for output.As for the earlier mixed-radix FFT, we have top-level functions to perform forward and inverse FFTs (lines 2-4), both of which call a common driver function (raderFFT'), passing in sign and scale parameters to control the direction and final output scaling of the transform.

The final generation of the transformed output (lines 10-12) applies the appropriate output scaling to a vector constructed from the sum of the input elements (index 0, corresponding to H0 in (2)) and (other indexes) a DC offset (h0 in (2)) and the appropriate element of the generator-based convolution. In order to deal with the generator-based indexing for the H values in (3), the convolution results are put into a map (convmap: line 46) from where they can easily be extracted in output index order.

Index organisationThe determination of the indexes needed for the generator-based indexing scheme in (3) is done by:

- Calculating the generator g for the group ℤp× where p is the prime input vector length (line 21: see below for details);
- Permuting the input vector elements according to powers of the group generator (line 25: this is the computation of the sequence aq for the convolution using the indexing defined in (3));
- Calculating the inverse element of the generator in ℤp× (line 23: the invModN function finds the multiplicative inverse modulo N of a given integer argument);
- Calculating the inverse generator-based indexes needed for permuting both the powers of ωp and the output elements (iidxs: line 35).

The iidxs index vector is used to permute the powers of ωp (line 38) producing the bq sequence used in the convolution of (3) and is also used (line 46) to build the index map used to extract result values from the result of the convolution.

Padding to power-of-two lengthIf p−1 is a power of two, no padding of the aq and bq sequences is needed for efficient calculation of the convolution (3). In all other cases, both sequences need to be padded to a suitable power of two. Calculation of the padded length (lines 17-19) take account of the minimum size requirement on the padded inputs to the convolution to avoid “wrap-around” effects, and this length is used to control the generation of padded versions of aq (apad: lines 30-32) and bq (bpad: line 41). (The “padded” version of bq is just a cyclic repeat of the values calculated in line 38.)

ConvolutionOnce suitably padded versions of the aq and bq sequences are computed, the actual convolution step is simple (line 44). Both sequences are Fourier transformed (recall that the padded sequence lengths are a power of two, so this is an efficient computation), multiplied together pointwise and inverse transformed. The convolution of the original unpadded sequences is then found in the first p−1 entries of this result — this is what is picked out by the indexing defined by convmap, defined in line 46.

Primitive root calculationAn important step in Rader’s prime length FFT algorithm is the determination of the generator of the group ℤp×. For this group, the group generator is called *the primitive root modulo p*.

There is no simple general formula to compute primitive roots modulo n, but there are methods to determine a primitive root faster than simply trying out all candidate values. In our case, we’re only ever going to be dealing with primitive roots of prime p, which simplifies things a little. We proceed by first calculating φ(p), where φ is Euler’s totient function. For prime p, φ(p)=p−1. We then determine the distinct prime factors of φ(p), which we’ll call p1,p2,…,pk. Then, for each m∈ℤp×, we compute

mφ(p)/pimodnfor i=1,…,k

using a fast algorithm for modular exponentiation (we use exponentiation by squaring). A number m for which these k values are all different from 1 is a primitive root.

The implementation is pretty much just a straightforward transcription into Haskell of the description above. In cases where there multiple primitive roots (the number of primitive roots modulo n, if there are any, is φ(φ(n))), we just take the first one:

primitiveRoot :: Int -> Int primitiveRoot p | isPrime p = let tot = p - 1 -- ^ Euler's totient function for prime values. totpows = map (tot `div`) $ fromList $ nub $ toList $ factors tot -- ^ Powers to check. check n = all (/=1) $ map (expt p n) totpows -- ^ All powers are different from 1 => primitive root. in fromIntegral $ head $ dropWhile (not . check) $ fromList [1..p-1] | otherwise = error "Attempt to take primitive root of non-prime value" -- | Fast exponentation modulo n by squaring. -- expt :: Int -> Int -> Int -> Int expt n b pow = fromIntegral $ go pow where bb = fromIntegral b nb = fromIntegral n go :: Int -> Integer go p | p == 0 = 1 | p `mod` 2 == 1 = (bb * go (p - 1)) `mod` nb | otherwise = let h = go (p `div` 2) in (h * h) `mod` nbWe also have some support code for QuickCheck testing of the primitive root algorithm:

-- | QuickCheck generator for prime values. Inefficient... -- newtype Prime a = Prime { getPrime :: a } deriving (Eq, Ord, Show, Read, Num, Integral, Real, Enum) instance (Integral a, Ord a, Arbitrary a) => Arbitrary (Prime a) where arbitrary = (Prime . (\n -> P.head $ P.dropWhile (< n) primes)) `fmap` (arbitrary `suchThat` (> 1)) -- Test code for primitive root determination. prop_primitive_root ((Prime n) :: Prime Int) = primitiveRootTest n $ primitiveRoot n primitiveRootTest :: Int -> Int -> Bool primitiveRootTest p root | isPrime p = (sort $ toList $ pows) == [1..p-1] | otherwise = error "Attempt to take primitive root of non-prime value" where pows = generate (p - 1) (expt p root)We need to generate prime numbers to test the primitive root calculation, and it’s much quicker to write a separate QuickCheck generator, using a specialised Arbitrary instance for a custom Prime newtype to do this than to try generating positive integers and filtering out non-primes using QuickCheck’s ==> filtering operator. We can collect some evidence that the algorithm works by doing something like this in GHCi:

> :load Prime-FFT.hs [1 of 1] Compiling Prime-FFT ( Prime-FFT.hs, interpreted ) Ok, modules loaded: PrimeFFT. > verboseCheckWith (stdArgs { maxSize=25 }) prop_primitive_root Passed: Prime {getPrime = 3} Passed: Prime {getPrime = 7} ... Passed: Prime {getPrime = 14771} Passed: Prime {getPrime = 6691} Passed: Prime {getPrime = 23753} +++ OK, passed 100 tests. Testing of Rader’s algorithmWe can use the same approach for testing our implementation of Rader’s algorithm that we used for testing the mixed-radix FFT algorithm, i.e. by comparing Rader FFT results to results of the basic DFT algorithm, and testing FFT/IFFT round-trip calculations. The only slight wrinkle here is that we need to test for input vectors of prime length only, so that we need a specialised Arbitrary instance to generate these:

instance Arbitrary VCD where arbitrary = do len <- elements $ P.takeWhile (< 500) primes fromList <$> vectorOf len arbitraryWith this, we can do some testing. In GHCi:

> :load Prime-FFT.hs [1 of 1] Compiling PrimeFFT ( Prime-FFT.hs, interpreted ) Ok, modules loaded: PrimeFFT. > quickCheck prop_dft_vs_fft +++ OK, passed 100 tests. > quickCheck prop_ifft +++ OK, passed 100 tests.Nice. It works. There are definitely improvements that we can make to the code, which we’ll do when we start optimising the overall FFT calculation, but the algorithm is complicated enough that it’s nice just to have a working version to start from!

data-analysis haskell <script src="http://zor.livefyre.com/wjs/v3.0/javascripts/livefyre.js" type="text/javascript"></script> <script type="text/javascript"> (function () { var articleId = fyre.conv.load.makeArticleId(null); fyre.conv.load({}, [{ el: 'livefyre-comments', network: "livefyre.com", siteId: "290329", articleId: articleId, signed: false, collectionMeta: { articleId: articleId, url: fyre.conv.load.makeCollectionUrl(), } }], function() {}); }()); </script>### December 31 deadline for Haskell.org donations

### ANN: treeviz-0.0.4 - a tool for visualizingDivide&Conquer computational breakdown.

### Bryan O'Sullivan: Testing a UTF-8 decoder with vigour

Yesterday, Michael Snoyman reported a surprising regression in version 1.0 of my Haskell text library: for some invalid inputs, the UTF-8 decoder was truncating the invalid data instead of throwing an exception.

Thanks to Michael providing an easy repro, I quickly bisected the origin of the regression to a commit from September that added support for incremental decoding of UTF-8. That work was motivated by applications that need to be able to consume incomplete input (e.g. a network packet containing possibly truncated data) as early as possible.

The low-level UTF-8 decoder is implemented as a state machine in C to squeeze as much performance out as possible. The machine has two visible end states: UTF8_ACCEPT indicates that a buffer was completely successfully decoded, while UTF8_REJECT specifies that the input contained invalid UTF-8 data. When the decoder stops, all other machine states count as work in progress, i.e. a decode that couldn’t complete because we reached the end of a buffer.

When the old all-or-nothing decoder encountered an incomplete or invalid input, it would back up by a single byte to indicate the location of the error. The incremental decoder is a refactoring of the old decoder, and the new all-or-nothing decoder calls it.

The critical error arose in the refactoring process. Here’s the old code for backing up a byte.

/* Error recovery - if we're not in a valid finishing state, back up. */ if (state != UTF8_ACCEPT) s -= 1;This is what the refactoring changed it to:

/* Invalid encoding, back up to the errant character. */ if (state == UTF8_REJECT) s -= 1;To preserve correctness, the refactoring should have added a check to the new all-or-nothing decoder so that it would step back a byte if the final state of the incremental decoder was *neither* UTF8_ACCEPT nor UTF8_REJECT. Oops! A very simple bug with unhappy consequences.

The text library has quite a large test suite that has revealed many bugs over the years, often before they ever escaped into the wild. Why did this ugly critter make it over the fence?

Well, a glance at the original code for trying to test UTF-8 error handling is telling—in fact, you don’t even need to be able to read a programming language, because the confession is in the comment.

-- This is a poor attempt to ensure that -- the error handling paths on decode are -- exercised in some way. Proper testing -- would be rather more involved.“Proper testing” indeed. All that I did in the original test was generate a random byte sequence, and see if it provoked the decoder into throwing an exception. The chances of such a dumb test really offering any value are not great, but I had more or less forgotten about it, and so I had a sense of security without the accompanying security. But hey, at least past-me had left a *mea culpa* note for present-day-me. Right?

While finding and fixing the bug took just a few minutes, I spent several more hours strengthening the test for the UTF-8 decoder, and this was far more interesting.

As a variable-length self-synchronizing encoding, UTF-8 is very clever and elegant, but its cleverness allows for a number of implementation bugs. For reference, here is a table (lightly edited from Wikipedia) of the allowable bit patterns used in UTF-8.

firstcode point last

code point byte 1 byte 2 byte 3 byte 4 U+0000 U+007F 0xxxxxxx U+0080 U+07FF 110xxxxx 10xxxxxx U+0800 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx U+10000 U+1FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

The best known of these bugs involves accepting non-canonical encodings. What a canonical encoding *means* takes a little explaining. UTF-8 can represent any ASCII character in a single byte, and in fact every ASCII character *must* be represented as a single byte. However, an illegal two-byte encoding of an ASCII character can be achieved by starting with 0xC0, followed by the ASCII character with the high bit set. For instance, the ASCII forward slash U+002F is represented in UTF-8 as 0x2F, but a decoder with this bug would also accept 0xC0 0xAF (three- and four-byte encodings are of course also possible).

This bug may seem innocent, but it was widely used to remotely exploit IIS 4 and IIS 5 servers over a decade ago. Correct UTF-8 decoders must reject non-canonical encodings. (These are also known as *overlong* encodings.)

In fact, the bytes 0xC0 and 0xC1 will *never* appear in a valid UTF-8 bytestream, as they can only be used to start two-byte sequences that cannot be canonical.

To test our UTF-8 decoder’s ability to spot bogus input, then, we might want to generate byte sequences that start with 0xC0 or 0xC1. Haskell’s QuickCheck library provides us with just such a generating function, choose, which generates a random value in the given range (inclusive).

choose (0xC0, 0xC1)Once we have a bad leading byte, we may want to follow it with a continuation byte. The value of a particular continuation byte doesn’t much matter, but we would like it to be valid. A continuation byte always contains the bit pattern 0x80 combined with six bits of data in its least significant bits. Here’s a generator for a random continuation byte.

contByte = (0x80 +) <$> choose (0, 0x3F)Our bogus leading byte should be rejected immediately, since it can never generate a canonical encoding. For the sake of thoroughness, we should sometimes follow it with a valid continuation byte to ensure that the two-byte sequence is also rejected.

To do this, we write a general combinator, upTo, that will generate a list of up to n random values.

upTo :: Int -> Gen a -> Gen [a] upTo n gen = do k <- choose (0,n) vectorOf k gen -- a QuickCheck combinatorAnd now we have a very simple way of saying “either 0xC0 or 0xC1, optionally followed by a continuation byte”.

-- invalid leading byte of a 2-byte sequence. (:) <$> choose (0xC0,0xC1) <*> upTo 1 contByteNotice in the table above that a 4-byte sequence can encode any code point up to U+1FFFFF. The highest legal Unicode code point is U+10FFFF, so by implication there exists a range of leading bytes for 4-byte sequences that can never appear in valid UTF-8.

-- invalid leading byte of a 4-byte sequence. (:) <$> choose (0xF5,0xFF) <*> upTo 3 contByteWe should never encounter a continuation byte without a leading byte somewhere before it.

-- Continuation bytes without a start byte. listOf1 contByte -- The listOf1 combinator generates a list -- containing at least one element.Similarly, a bit pattern that introduces a 2-byte sequence must be followed by one continuation byte, so it’s worth generating such a leading byte *without* its continuation byte.

We do the same for 3-byte and 4-byte sequences.

-- Short 3-byte sequence. (:) <$> choose (0xE0, 0xEF) <*> upTo 1 contByte -- Short 4-byte sequence. (:) <$> choose (0xF0, 0xF4) <*> upTo 2 contByteEarlier, we generated 4-byte sequences beginning with a byte in the range 0xF5 to 0xFF. Although 0xF4 is a valid leading byte for a 4-byte sequence, it’s possible for a perverse choice of continuation bytes to yield an illegal code point between U+110000 and U+13FFFF. This code generates just such illegal sequences.

-- 4-byte sequence greater than U+10FFFF. k <- choose (0x11, 0x13) let w0 = 0xF0 + (k `Bits.shiftR` 2) w1 = 0x80 + ((k .&. 3) `Bits.shiftL` 4) ([w0,w1]++) <$> vectorOf 2 contByteFinally, we arrive at the general case of non-canonical encodings. We take a one-byte code point and encode it as two, three, or four bytes; and so on for two-byte and three-byte characters.

-- Overlong encoding. k <- choose (0,0xFFFF) let c = chr k case k of _ | k < 0x80 -> oneof [ let (w,x) = ord2 c in return [w,x] , let (w,x,y) = ord3 c in return [w,x,y] , let (w,x,y,z) = ord4 c in return [w,x,y,z] ] | k < 0x7FF -> oneof [ let (w,x,y) = ord3 c in return [w,x,y] , let (w,x,y,z) = ord4 c in return [w,x,y,z] ] | otherwise -> let (w,x,y,z) = ord4 c in return [w,x,y,z] -- The oneof combinator chooses a generator at random. -- Functions ord2, ord3, and ord4 break down a character -- into its 2, 3, or 4 byte encoding.Armed with a generator that uses oneof to choose one of the above invalid UTF-8 encodings at random, we embed the invalid bytestream in one of three cases: by itself, at the end of an otherwise valid buffer, and at the beginning of an otherwise valid buffer. This variety gives us some assurance of catching buffer overrun errors.

Sure enough, this vastly more elaborate QuickCheck test immediately demonstrates the bug that Michael found.

The original test is a classic case of basic fuzzing: it simply generates random junk and hopes for the best. The fact that it let the decoder bug through underlines the weakness of fuzzing. If I had cranked the number of randomly generated test inputs up high enough, I’d probably have found the bug, but the approach of pure randomness would have caused the bug to remain difficult to reproduce and understand.

The revised test is much more sophisticated, as it generates only test cases that are known to be invalid, with a rich assortment of precisely generated invalid encodings to choose from. While it has the same probabilistic nature as the fuzzing approach, it excludes a huge universe of uninteresting inputs from being tested, and hence is much more likely to reveal a weakness quickly and efficiently.

The moral of the story: even QuickCheck tests, though vastly more powerful than unit tests and fuzz tests, are only as good as you make them!