# News aggregator

### hmatrix-0.16.0.4 installation problem

### Theory Lunch (Institute of Cybernetics, Tallinn): Transgressing the limits

Today, the 14th of January 2014, we had a special session of our Theory Lunch. I spoke about ultrafilters and how they allow to generalize the notion of limit.

Consider the space of bounded sequences of real numbers, together with the supremum norm. We would like to define a notion of limit which holds for *every* and satisfies the well known properties of standard limit:

*Linearity:*.*Homogeneity:*.*Monotonicity:*if for every then .*Nontriviality:*if for every then .*Consistency:*if the limit exists in the classical sense, then the two notions coincide.

The consistency condition is reasonable also because it avoids trivial cases: if we fix and we define the limit of the sequence as the value , then the first four properties are satisfied.

Let us recall the classical definition of limit: we say that converges to if and only if, for every , the set of values such that is *cofinite*, *i.e.*, has a finite complement: the inequality can be satisfied at most for finitely many values of . The family of cofinite subsets of (in fact, of any set ) has the following properties:

*Upper closure:*if and then .*Meet stability:*if then .

A family of subsets of with the two properties above is called a *filter* on . An immediate example is the *trivial filter* ; another example is the *improper filter* . The family of cofinite subset of is called the *Fréchet filter* on . The Fréchet filter is not the improper one if and only if is infinite.

An *ultrafilter* on is a filter on satisfying the following additional conditions:

*Properness:*.*Maximality:*for every , either or .

For example, if , then is an ultrafilter on , called the *principal ultrafilter* generated by . Observe that : if we say that is *free*. These are, in fact, the only two options.

**Lemma 1.** For a proper filter to be an ultrafilter, it is necessary and sufficient that it satisfies the following condition: for every and nonempty , if then for at least one .

*Proof:* It is sufficient to prove the thesis with . If with , then is a proper filter that properly contains . If the condition is satisfied, for every which is neither nor we have , thus either or .

**Theorem 1.** Every nonprincipal ultrafilter is free. In addition, an ultrafilter is free if and only if it extends the Fréchet filter. In particular, every ultrafilter over a finite set is principal.

*Proof:* Let be a nonprincipal ultrafilter. Let : then , so either there exists such that and , or there exists such that and . In the first case, ; in the second case, we consider and reduce to the first case. As is arbitrary, is free.

Now, for every the set belongs to but not to : therefore, no principal ultrafilter extends the Fréchet filter. On the other hand, if is an ultrafilter, is finite, and , then by maximality, hence for some because of Lemma 1, thus cannot be a free ultrafilter.

So it seems that free ultrafilters are the right thing to consider when trying to expand the concept of limit. There is an issue, though: we have not seen any single example of a free ultrafilter; in fact, we do not even (yet) know whether free ultrafilters do exist! The answer to this problem comes, in a shamelessly nonconstructive way, from the following

**Ultrafilter lemma.** Every proper filter can be extended to an ultrafilter.

The ultrafilter lemma, together with Theorem 1, implies the existence of free ultrafilters on every infinite set, and in particular on . On the other hand, to prove the ultrafilter lemma the Axiom of Choice is required, in the form of Zorn’s lemma. Before giving such proof, we recall that a family of sets has the *finite intersection property* if every finite subfamily has a nonempty intersection: every proper filter has the finite intersection property.

*Proof of the ultrafilter lemma.* Let be a proper filter on and let be the family of the collections of subsets of that extend and have the finite intersection property, ordered by inclusion. Let be a totally ordered subfamily of : then extends and has the finite intersection property, because for every finitely many there exists by construction such that .

By Zorn’s lemma, has a maximal element , which surely satisfies and . If and , then still has the finite intersection property, therefore by maximality. If then still has the finite intersection property, therefore again by maximality.

Suppose, for the sake of contradiction, that there exists such that and : then neither nor have the finite intersection property, hence there exist such that . But means , and means : therefore,

against having the finite intersection property.

We are now ready to expand the idea of limit. Let be a metric space and let be an ultrafilter on : we say that is the *ultralimit* of the sequence along if for every the set

belongs to . (Observe how, in the standard definition of limit, the above set is required to belong to the Fréchet filter.) If this is the case, we write

Ultralimits, if they exist, are unique and satisfy our first four conditions. Moreover, the choice of a principal ultrafilter corresponds to the trivial definition . So, what about free ultrafilters?

**Theorem 2.** Every bounded sequence of real numbers has an ultralimit along every free ultrafilter on .

*Proof:* It is not restrictive to suppose for every . Let be an arbitrary, but fixed, free ultrafilter on . We will construct a sequence of closed intervals , , such that and for every . By the Cantor intersection theorem it will be : we will then show that .

Let . Let be either or , chosen according to the following criterion: . If both halves satisfy the criterion, then we just choose one once and for all. We iterate the procedure by always choosing as one of the two halves of such that .

Let . Let , and let be so large that : then , thus . As the smaller set belongs to , so does the larger one.

We have thus almost achieved our original target: a notion of limit which applies to every bounded sequence of real numbers. Such notion will depend on the specific free ultrafilter we choose: but it is already very reassuring that such a notion exists at all! To complete our job we need one more check: we have to be sure that the definition is consistent with the classical one. And this is indeed what happens!

**Theorem 3.** Let be a sequence of real numbers and let . Then in the classical sense if and only if for every free ultrafilter on .

To prove Theorem 3 we make use of an auxiliary result, which is of interest by itself.

**Lemma 2.** Let be the family of collections of subsets of that have the finite intersection property. The maximal elements of are precisely the ultrafilters.

*Proof:* Every ultrafilter is clearly maximal in . If is maximal in , then it is clearly proper and upper closed, and we can reason as in the proof of the ultrafilter lemma to show that it is actually an ultrafilter.

*Proof of Theorem 3:* Suppose does not converge to in the classical sense. Fix such that the set is infinite. Then the family has the finite intersection property: an ultrafilter that extends must be free. Then , and does not have an ultralimit along .

The converse implication follows from the classical definition of limit, together with the very notion of free ultrafilter.

Theorem 3 does hold for sequences of real numbers, but does not extend to arbitrary metric spaces. In fact, the following holds, which we state without proving.

**Theorem 4.** Let be a metric space. The following are equivalent.

- For some free ultrafilter on , every sequence in has an ultralimit along .
- For every free ultrafilter on , every sequence in has an ultralimit along .
- is compact.

Ultrafilters are useful in many other contexts. For instance, they are used to construct *hyperreal numbers*, which in turn allow a rigorous definition of infinitesimals and the foundation of calculus over those. But this might be the topic for another Theory Lunch talk.

### Haskell and Memristors

As some of you may know HP began working on a memristor computer and OS. The goal of memristors is to replace all forms of storage (SRAM, DRAM and Disk), while being faster than DRAM.

Does the Haskell community have any thoughts on what this could mean for a language like Haskell?

General Memrisor Infomration: http://en.wikipedia.org/wiki/Memristor

HP Specific Information: http://arstechnica.com/information-technology/2014/06/hp-plans-to-launch-memristor-silicon-photonic-computer-within-the-decade/

submitted by Artemis311[link] [12 comments]

### Rechunk a conduit (into larger chunks)

### Why this existential type could not work?

### Edward Z. Yang: The fundamental problem of programming language package management

Why are there so many goddamn package managers? They sprawl across both operating systems (apt, yum, pacman, Homebrew) as well as for programming languages (Bundler, Cabal, Composer, CPAN, CRAN, CTAN, EasyInstall, Go Get, Maven, npm, NuGet, OPAM, PEAR, pip, RubyGems, etc etc etc). "It is a truth universally acknowledged that a programming language must be in want of a package manager." What is the fatal attraction of package management that makes programming language after programming language jump off this cliff? Why can't we just, you know, reuse an existing package manager?

You can probably think of a few reasons why trying to use apt to manage your Ruby gems would end in tears. "System and language package managers are completely different! Distributions are vetted, but that's completely unreasonable for most libraries tossed up on GitHub. Distributions move too slowly. Every programming language is different. The different communities don't talk to each other. Distributions install packages globally. I want control over what libraries are used." These reasons are all *right*, but they are missing the essence of the problem.

The fundamental problem is that programming languages package management is **decentralized**.

This decentralization starts with the central premise of a package manager: that is, to install software and libraries that would otherwise not be locally available. Even with an idealized, centralized distribution curating the packages, there are still two parties involved: the distribution and the *programmer* who is building applications locally on top of these libraries. In real life, however, the library ecosystem is further fragmented, composed of packages provided by a huge variety of developers. Sure, the packages may all be uploaded and indexed in one place, but that doesn't mean that any given author knows about any other given package. And then there's what the Perl world calls DarkPAN: the uncountable lines of code which probably exist, but which we have no insight into because they are locked away on proprietary servers and source code repositories. Decentralization can only be avoided when you control absolutely *all* of the lines of code in your application.. but in that case, you hardly need a package manager, do you? (By the way, my industry friends tell me this is basically mandatory for software projects beyond a certain size, like the Windows operating system or the Google Chrome browser.)

Decentralized systems are hard. Really, really hard. Unless you design your package manager accordingly, your developers *will* fall into dependency hell. Nor is there a one "right" way to solve this problem: I can identify at least three distinct approaches to the problem among the emerging generation of package managers, each of which has their benefits and downsides.

**Pinned versions.** Perhaps the most popular school of thought is that developers should aggressively pin package versions; this approach advocated by Ruby's Bundler, PHP's Composer, Python's virtualenv and pip, and generally any package manager which describes itself as inspired by the Ruby/node.js communities (e.g. Java's Gradle, Rust's Cargo). Reproduceability of builds is king: these package managers solve the decentralization problem by simply pretending the ecosystem doesn't exist once you have pinned the versions. The primary benefit of this approach is that you are always in control of the code you are running. Of course, the downside of this approach is that you are always in control of the code you are running. An all-to-common occurrence is for dependencies to be pinned, and then forgotten about, even if there are important security updates to the libraries involved. Keeping bundled dependencies up-to-date requires developer cycles--cycles that more often than not are spent on other things (like new features).

**A stable distribution.** If bundling requires every individual application developer to spend effort keeping dependencies up-to-date and testing if they keep working with their application, we might wonder if there is a way to centralize this effort. This leads to the second school of thought: to *centralize* the package repository, creating a blessed distribution of packages which are known to play well together, and which will receive bug fixes and security fixes while maintaining backwards compatibility. In programming languages, this is much less common: the two I am aware of are Anaconda for Python and Stackage for Haskell. But if we look closely, this model is *exactly the same* as the model of most operating system distributions. As a system administrator, I often recommend my users use libraries that are provided by the operating system as much as possible. They won't take backwards incompatible changes until we do a release upgrade, and at the same time you'll still get bugfixes and security updates for your code. (You won't get the new hotness, but that's essentially contradictory with stability!)

**Embracing decentralization.** Up until now, both of these approaches have thrown out decentralization, requiring a central authority, either the application developer or the distribution manager, for updates. Is this throwing out the baby with the bathwater? The primary downside of centralization is the huge amount of *work* it takes to maintain a stable distribution or keep an individual application up-to-date. Furthermore, one might not expect the entirety of the universe to be compatible with one another, but this doesn't stop subsets of packages from being useful together. An ideal decentralized ecosystem distributes the problem of identifying what subsets of packages *work* across everyone participating in the system. Which brings us to the fundamental, unanswered question of programming languages package management:

*How can we create a decentralized package ecosystem that works?*

Here are a few things that can help:

**Stronger encapsulation for dependencies.**One of the reasons why dependency hell is so insidious is the dependency of a package is often an inextricable part of its outwards facing API: thus, the choice of a dependency is not a local choice, but rather a global choice which affects the entire application. Of course, if a library uses some library internally, but this choice is entirely an implementation detail, this*shouldn't*result in any sort of global constraint. Node.js's NPM takes this choice to its logical extreme: by default, it doesn't deduplicate dependencies at all, giving each library its own copy of each of its dependencies. While I'm a little dubious about duplicating everything (it certainly occurs in the Java/Maven ecosystem), I certainly agree that keeping dependency constraints local improves*composability.***Advancing semantic versioning.**In a decentralized system, it's especially important that library writers give*accurate*information, so that tools and users can make informed decisions. Wishful, invented version ranges and artistic version number bumps simply exacerbate an already hard problem (as I mentioned in my previous post). If you can enforce semantic versioning, or better yet, ditch semantic versions and record the true,*type-level*dependency on interfaces, our tools can make better choices. The gold standard of information in a decentralized system is, "Is package A compatible with package B", and this information is often difficult (or impossible, for dynamically typed systems) to calculate.**Centralization as a special-case.**The point of a decentralized system is that every participant can make policy choices which are appropriate for them. This includes maintaining their own central authority, or deferring to someone else's central authority: centralization is a special-case. If we suspect users are going to attempt to create their own, operating system style stable distributions, we need to give them the tools to do so... and make them easy to use!

For a long time, the source control management ecosystem was completely focused on centralized systems. Distributed version control systems such as Git fundamentally changed the landscape: although Git may be more difficult to use than Subversion for a non-technical user, the benefits of decentralization are diverse. The Git of package management doesn't exist yet: if someone tells you that package management is solved, just reimplement Bundler, I entreat you: think about decentralization as well!

### www.fpcomplete.com

### Haskell Weekly News: Issue 302

### Are there any libraries/examples of using reversible IO monads to implement an undo/redo system?

Sorry in advance if I have the terms wrong. I'm want to use this idea for a 2d drawing program, where, in Haskell, for each drawing action (as a monad) is returned, the database can store information about all actions in the sequence (since each /api/request can return a single action or a sequence of them). Then, a user can execute an undo action provided by the database to reverse the entire previous sequence. Of course the previous data would have to be stored in the database as well.

Is there is an existing library aimed at doing this? Or possible a database that I can interface with Haskell to provide the version capabilities at least?

I saw this thread but I couldn't make sense of the responses.

submitted by snoozer_cruiser[link] [3 comments]

### Vector sort poor performance

### Lee Pike: SmartChecking Matt Might’s Red-Black Trees

Matt Might gave a nice intro to QuickCheck via testing red-black trees recently. Of course, QuickCheck has been around for over a decade now, but it’s still useful (if underused–why aren’t *you* QuickChecking your programs!?).

In a couple of weeks, I’m presenting a paper on an alternative to QuickCheck called SmartCheck at the Haskell Symposium.

SmartCheck focuses on efficiently shrinking and generalizing large counterexamples. I thought it’d be fun to try some of Matt’s examples with SmartCheck.

The kinds of properties Matt Checked really aren’t in the sweet spot of SmartCheck, since the counterexamples are so small (Matt didn’t even have to define instances for shrink!). SmartCheck focuses on shrinking and generalizing large counterexamples.

Still, let’s see what it looks like. (The code can be found here.)

SmartCheck is only interesting for failed properties, so let’s look at an early example in Matt’s blog post where something goes wrong. A lot of the blog post focuses on generating sufficiently constrained arbitrary red-black trees. In the section entitled, “A property for balanced black depth”, a property is given to check that the path from the root of a tree to every leaf passes through the same number of black nodes. An early generator for trees fails to satisfy the property.

To get the code to work with SmartCheck, we derive Typeable and Generic instances for the datatypes, and use GHC Generics to automatically derive instances for SmartCheck’s typeclass. The only other main issue is that SmartCheck doesn’t support a `forall` function like in QuickCheck. So instead of a call to QuickCheck such as

> quickCheck (forAll nrrTree prop_BlackBalanced)

We change the arbitrary instance to be the nrrTree generator.

Because it is so easy to find a small counterexample, SmartCheck’s reduction algorithm does a little bit of automatic shrinking, but not too much. For example, a typical minimal counterexample returned by SmartCheck looks like

T R E 2 (T B E 5 E)

which is about as small as possible. Now onto generalization!

There are three generalization phases in SmartCheck, but we’ll look at just one, in which a formula is returned that is universally quantified if every test case fails. For the test case above, SmartCheck returns the following formula:

forall values x0 x1:

T R E 2 (T B x1 5 x0)

Intuitively, this means that for any well-typed trees chosen that could replace the variables x0 and x1, the resulting formula is still a counterexample.

The benefit to developers is seeing instantly that those subterms in the counterexample probably don’t matter. The real issue is that E on the left is unbalanced with (T B E 5 E) on the right.

One of the early design decisions in SmartCheck was focus on structurally shrinking data types and essentially ignore “base types” like Int, char, etc. The motivation was to improve efficiency on shrinking large counterexamples.

But for a case like this, generalizing base types would be interesting. We’d hypothetically get something like

forall values (x0, x1 :: RBSet Int) (x2, x3 :: Int):

T R E x2 (T B x1 x3 x0)

further generalizing the counterexample. It may be worth adding this behavior to SmartCheck.

SmartCheck’s generalization begins to bridge the gap from specific counterexamples to *formulas* characterizing counterexamples. The idea is related to QuickSpec, another cool tool developed by Claessen and Hughes (and SmallBone). Moreover, it’s a bridge between testing and verification, or as Matt puts it, from the 80% to the 20%.

### FP Complete: IAP: Speeding up conduit

This post contains fragments of active Haskell code, best viewed and executed at https://www.fpcomplete.com/blog/2014/08/iap-speeding-up-conduit

As most of us know, performance isn't a one-dimensional spectrum. There are in fact multiple different ways to judge performance of a program. A commonly recognized tradeoff is that between CPU and memory usage. Often times, a program can be sped up by caching more data, for example.

conduit is a streaming data library. In that sense, it has two very specific performance criterion it aims for:

- Constant memory usage.
- Efficient usage of scarce resources, such as closing file descriptors as early as possible.

While CPU performance is always a nice goal, it has never been my top priority in the library's design, especially given that in the main use case for conduit (streaming data in an I/O context), the I/O cost almost always far outweighs any CPU overhead from conduit.

However, for our upcoming Integrated Analysis
Platform (IAP) release, this is no longer
the case. conduit will be used in tight loops, where we *do* need to optimize
for the lowest CPU overhead possible.

This blog post covers the first set of optimizations I've applied to conduit. There is still more work to be done, and throughout this blogpost I'll be describing some of the upcoming changes I am attempting.

I'll give a brief summary up front:

- Applying the codensity transform results in much better complexity of monadic bind.
- We're also less reliant on rewrite rules firing, which has always been unreliable (and now I know why).
- This change
*does*represent a breaking API change. However, it only affects users of the Data.Conduit.Internal module. If you've just been using the public API, your code will be unaffected, besides getting an automatic speedup. - These changes will soon be released as conduit 1.2.0, after a period for community feedback.

Note that this blog post follows the actual steps I went through (more or less) in identifying the performance issues I wanted to solve. If you want to skip ahead to the solution itself, you may want to skip to the discussion on difference lists, or even straight to continuation passing style, church-encoding, codensity.

By the way, after I originally wrote this blog post, I continued working on the optimizations I describe as possible future enhancements. Those are actually working out far better than I expected, and it looks like conduit 1.2.0 will be able to ship with them. I'll be writing a separate blog post detailing those changes. A bit of a teaser is: for vector-equivalent code, conduit now generates identical core as vector itself.

The benchmarksBefore embarking on any kind of serious optimizations, it's important to have some benchmarks. I defined three benchmarks for the work I was going to be doing:

A simple sum: adding up the numbers from 1 to 10000. This is to get a baseline of the overhead coming from conduit.

A monte carlo analysis: This was based on a previous IAP blog post. I noticed when working on that benchmark that, while the conduit solution was highly

*memory*efficient, there was still room to speed up the benchmark.Sliding vectors: Naren Sundar recently sent a sliding windows pull requests, which allow us to get a view of a fixed size of a stream of values. This feature is very useful for a number of financial analyses, especially regarding time series.

Naren's pull request was based on immutable data structures, and for those cases it is highly efficient. However, it's possible to be far more memory efficient by writing to a mutable vector instead, and then taking immutable slices of that vector. Mihaly Barasz sent a pull request for this feature, and much to our disappointment, for small window sizes, it performed worse than sliding windows. We want to understand why.

You can see the benchmark code, which stays mostly unchanged for the rest of this blog post (a few new cases are added to demonstrate extra points). The benchmarks always contain a low-level base case representing the optimal performance we can expect from hand-written Haskell (without resorting to any kind of FFI tricks or the like).

You can see the first run results which reflect conduit 1.1.7, plus inlining of a few functions. Some initial analysis:

- Control.Monad.foldM is surpringly slow.
- Data.Conduit.List.foldM has a rather steep performance hit versus Data.Conduit.List.fold.
- There's a
*very*high overhead in the monte carlo analysis. - For sliding vector, the conduit overhead is more pronounced at smaller window sizes.
- But even with large window sizes, mutable vector conduits still have a large overhead. The sliding window/immutable approach, however, shows almost no overhead.

That hopefully sets the scene enough for us to begin to dive in.

Rewrite rules: liftGHC offers a very powerful optimization technique: rewrite rules. This allows you to tell the compiler that a certain expression can be rewritten to a more efficient one. A common example of a rewrite rule would be to state that map f . map g is the same as map (f . g). This can be expressed as:

{-# RULES "map f . map g" forall f g. map f . map g = map (f . g) #-}

Note that GHC's list rewrite rules are actually more complicated than this, and revolve around a concept called build/foldr fusion.

Let's look at the implementation of the yield function in conduit (with some newtypes stripped away):

yield :: Monad m => o -> ConduitM i o m () yield o = HaveOutput (Done ()) (return ()) o {-# INLINE [1] yield #-} {-# RULES "yield o >> p" forall o (p :: ConduitM i o m r). yield o >> p = HaveOutput p (return ()) o #-}The core datatype of conduit is recursive. The HaveOutput constructor
contains a field for "what to do next." In the case of yield, there *isn't*
anything to do next, so we fill that with Done (). However, creating that
Done () value just to throw it away after a monadic bind is wasteful. So we
have a rewrite rule to fuse those two steps together.

But no such rewrite rule exists for lift! My first step was to add such a rule, and check the results. Unfortunately, the rule didn't have any real impact, because it wasn't firing. Let's put that issue to the side; we'll come back to it later.

Cleanup, inliningOne of the nice features introduced in (I believe) GHC 7.8 is that the compiler will now warn you when a rewrite rule may not fire. When compiling conduit, I saw messages like:

Data/Conduit/List.hs:274:11: Warning: Rule "source/map fusion $=" may never fire because ‘$=’ might inline first Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$=’ Data/Conduit/List.hs:275:11: Warning: Rule "source/map fusion =$=" may never fire because ‘=$=’ might inline first Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘=$=’ Data/Conduit/List.hs:542:11: Warning: Rule "source/filter fusion $=" may never fire because ‘$=’ might inline first Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$=’ Data/Conduit/List.hs:543:11: Warning: Rule "source/filter fusion =$=" may never fire because ‘=$=’ might inline first Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘=$=’ Data/Conduit/List.hs:552:11: Warning: Rule "connect to sinkNull" may never fire because ‘$$’ might inline first Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$$’This demonstrates an important interaction between inlining and rewrite rules. We need to make sure that expressions that need to be rewritten are not inlined first. If they are first inlined, then GHC won't be able to rewrite them to our more optimized version.

A common approach to this is to *delay* inlining of functions until a later
*simplification phase*. The GHC simplification process runs in multiple steps,
and we can state that rules and inlining should only happen before or after a
certain phase. The phases count down from 2 to 0, so we commonly want to delay
inlining of functions until phase 0, if they may be subject to rewriting.

Conversely, some functions need to be inlined *before* a rewrite rule can fire. In stream fusion, for example, the fusion framework depends on the following sequencing to get good performance:

In conduit, we need to make sure that all of this is happening in the correct
order. There was one particular complexity that made it difficult to ensure
this happened. conduit in fact has *two* core datatypes: Pipe and ConduitM,
with the latter being a more friendly newtype wrapper around the first. Up
until this point, the code for the two was jumbled into a single internal
module, making it difficult to track which things were being written in which
version of the API.

My next step was to split things into .Pipe and .Conduit internal
modules,
and then clean up GHC's
warnings
to get rules to fire more reliably. This gave a modest performance
boost
to the sliding vector benchmarks, but not much else. But it *does* pave the way
for future improvements.

The results so far have been uninspiring. We've identified a core problem (too
many of those Done data constructors being used), and noticed that the
rewrite rules that *should* fix that don't seem to be doing their job. Now
let's take our first stab at *really* improving performance: with aggressive
rewrite rules.

Our sum benchmark is really simple: use enumFromTo to create a stream of values, and fold (or foldM) to consume that. The thing that slows us down is that, in between these two simple functions, we end up allocating a bunch of temporary data structures. Let's get rid of them with rewrite rules!

This certainly did the trick. The conduit implementation jumped from 185us to just 8.63us. For comparison, the low level approach (or vector's stream fusion) clocks in at 5.77us, whereas foldl' on a list is 80.6us. This is a huge win!

But it's also misleading. All we've done here is sneakily rewritten our conduit
algorithm into a low-level format. This solves the specific problem on the
table (connecting enumFromTo with fold), but won't fully generalize to other
cases. A more representative demonstration of this improvement is the speedup
for foldM, which went from 1180us to 81us. The reason this is more realistic
is that the rewrite rule is not specialized to enumFromTo, but rather works
on *any* Source.

I took a big detour at this point, and ended up writing an initial implementation of stream fusion in conduit. Unfortunately, I ran into a dead end on that branch, and had to put that work to the side temporarily. However, the improvements discussed in the rest of this blog post will hopefully reopen the door to stream fusion, which I hope to investigate next.

Monte carlo, and associativityNow that I'd made the results of the sum benchmark thoroughly useless, I decided to focus on the results of monte carlo, where the low level implementation still won by a considerable margin (3.42ms vs 10.6ms). The question was: why was this happening? To understand, let's start by looking at the code:

analysis = do successes <- sourceRandomN count $$ CL.fold (\t (x, y) -> if (x*x + y*(y :: Double) < 1) then t + 1 else t) (0 :: Int) return $ fromIntegral successes / fromIntegral count * 4 sourceRandomN :: (MWC.Variate a, MonadIO m) => Int -> Source m a sourceRandomN cnt0 = do gen <- liftIO MWC.createSystemRandom let loop 0 = return () loop cnt = do liftIO (MWC.uniform gen) >>= yield >> loop (cnt - 1) loop cnt0The analysis function is not very interesting: it simply connects sourceRandomN with a fold. Given that we now have a well behaved and consistently-firing rewrite rule for connecting to folds, it's safe to say that was not the source of our slowdown. So our slowdown must be coming from:

liftIO (MWC.uniform gen) >>= yield >> loop (cnt - 1)This should in theory generate really efficient code. yield >> loop (cnt - 1) should be rewritten to \x -> HaveOutput (loop (cnt - 1)) (return ()) x), and then liftIO should get rewritten to generate:

PipeM $ do x <- MWC.uniform gen return $ HaveOutput (loop $ cnt - 1) (return ()) xI added another commit to include a few more versions of the monte carlo benchmark (results here). The two most interesting are:

Explicit usage of the Pipe constructors:

sourceRandomNConstr :: (MWC.Variate a, MonadIO m) => Int -> Source m a sourceRandomNConstr cnt0 = ConduitM $ PipeM $ do gen <- liftIO MWC.createSystemRandom let loop 0 = return $ Done () loop cnt = do x <- liftIO (MWC.uniform gen) return $ HaveOutput (PipeM $ loop (cnt - 1)) (return ()) x loop cnt0This version ran in 4.84ms, vs the original conduit version which ran in 15.8ms. So this is definitely the problem!

Explicitly force right-associated binding order:

sourceRandomNBind :: (MWC.Variate a, MonadIO m) => Int -> Source m a sourceRandomNBind cnt0 = lift (liftIO MWC.createSystemRandom) >>= \gen -> let loop 0 = return () loop cnt = do lift (liftIO $ MWC.uniform gen) >>= (\o -> yield o >> loop (cnt - 1)) in loop cnt0Or to zoom in on the important bit:

lift (liftIO $ MWC.uniform gen) >>= (\o -> yield o >> loop (cnt - 1))By the monad laws, this code is identical to the original. However, instead of standard left-associativity, we have right associativity or monadic bind. This code ran in 5.19ms, an approximate threefold speedup vs the left associative code!

This issue of associativity was something Roman Cheplyaka told me about back in April, so I wasn't surprised to see it here. Back then, I'd looked into using Codensity together with ConduitM, but didn't get immediate results, and therefore postponed further research until I had more time.

OK, so why exactly does left-associativity hurt us so much? There are two reasons actually:

- Generally speaking, many monads perform better when they are right associated. This is especially true for free monads, of which conduit is just a special case. Janis Voigtl ̈ander's paper Asymptotic Improvement of Computations over Free Monads and Edward Kmett's blog post series free monads for less do a far better job of explaining the issue than I could.
- In the case of conduit, left associativity prevented the lift and yield rewrite rules from firing, which introduced extra, unnecessary monadic bind operations. Forcing right associativity allows these rules to fire, avoiding a lot of unnecessary data constructor allocation and analysis.

At this point, it became obvious at this point that the main slowdown I was seeing was driven by this problem. The question is: how should we solve it?

Difference listsTo pave the way for the next step, I want to take a quick detour and talk about something simpler: difference lists. Consider the following code:

(((w ++ x) ++ y) ++ z)Most experienced Haskellers will cringe upon reading that. The append operation for a list needs to traverse every cons cell in its left value. When we left-associate append operations like this, we will need to traverse every cell in w, then every cell in w ++ x, then every cell in w ++ x ++ y. This is highly inefficient, and would clearly be better done in a right-associated style (sound familiar?).

But forcing programmers to ensure that their code is always right-associated isn't always practical. So instead, we have two common alternatives. The first is: use a better datastructure. In particular, Data.Sequence has far cheaper append operations than lists.

The other approach is to use *difference lists*. Difference lists are
*functions* instead of actual list values. They are instructions for adding
values to the beginning of the list. In order to append, you use normal
function composition. And to convert them to a list, you apply the resulting
function to an empty list. As an example:

Both difference lists and sequences have advantages. Probably the simplest summary is:

- Difference lists have smaller constant factors for appending.
- Sequences allow you to analyze them directly, without having to convert them to a different data type first.

That second point is important. If you need to regularly analyze your list and then continue to append, the performance of a difference list will be abysmal. You will constantly be swapping representations, and converting from a list to a difference list is an O(n) operation. But if you will simply be constructing a list once without any analysis, odds are difference lists will be faster.

This situation is *almost* identical to our problems with conduit. Our monadic
composition operator- like list's append operator- needs to traverse the entire
left hand side. This connection is more clearly spelled out in Reflection
without Remorse by Atze van
der Ploeg and Oleg Kiselyov (and for me, care of Roman).

Alright, with that out of the way, let's finally fix conduit!

Continuation passing style, church-encoding, codensityThere are essentially two things we need to do with conduits:

- Monadically compose them to sequence two streams into a larger stream.
- Categorically compose them to connect one stream to the next in a pipeline.

The latter requires that we be able to case analyze our datatypes, while theoretically the former does not: something like difference lists for simple appending would be ideal. In the past, I've tried out a number of different alternative implementations of conduit, none of which worked well enough. The problem I always ran into was that either monadic bind became too expensive, or categorical composition became too expensive.

Roman, Mihaly, Edward and I discussed these issues a bit on Github, and based on Roman's advice, I went ahead with writing a benchmark of different conduit implementations. I currently have four implementations in this benchmark (and hope to add more):

- Standard, which looks very much like conduit 1.1, just a bit simplified (no rewrite rules, no finalizers, no leftovers).
- Free, which is conduit rewritten to explicitly use the free monad transformer.
- Church, which modifies Free to instead use the Church-encoded free monad transformer.
- Codensity, which is a Codensity-transform-inspired version of conduit.

You can see the benchmark results, which clearly show the codensity version to be the winner. Though it would be interesting, I think I'll avoid going into depth on the other three implementations for now (this blog post is long enough already).

What is Codensity?Implementing Codensity in conduit just means changing the ConduitM newtype wrapper to look like this:

newtype ConduitM i o m r = ConduitM { unConduitM :: forall b. (r -> Pipe i i o () m b) -> Pipe i i o () m b }What this says is "I'm going to provide an r value. If you give me a function that needs an r value, I'll give it that r value and then continue with the resulting Pipe." Notice how similar this looks to the type signature of monadic bind itself:

(>>=) :: Pipe i i o () m r -> (r -> Pipe i i o () m b) -> Pipe i i o () m bThis isn't by chance, it's by construction. More information is available in the Haddocks of kan-extension, or in the above-linked paper and blog posts by Janis and Edward. To see why this change is important, let's look at the new implementations of some of the core conduit functions and type classes:

yield o = ConduitM $ \rest -> HaveOutput (rest ()) (return ()) o await = ConduitM $ \f -> NeedInput (f . Just) (const $ f Nothing) instance Monad (ConduitM i o m) where return x = ConduitM ($ x) ConduitM f >>= g = ConduitM $ \h -> f $ \a -> unConduitM (g a) h instance MonadTrans (ConduitM i o) where lift mr = ConduitM $ \rest -> PipeM (liftM rest mr)Instead of having explicit Done constructors in yield, await, and lift,
we use the continuation rest. This is the *exact same transformation* we were
previously relying on rewrite rules to provide. However, our rewrite rules
couldn't fire properly in a left-associated monadic binding. Now we've avoided
the whole problem!

Our Monad instance also became much smaller. Notice that in order to monadically compose, there is no longer any need to case-analyze the left hand side, which avoids the high penalty of left association.

Another interesting quirk is that our Monad instance on ConduitM no longer requires that the base m type constructor itself be a Monad. This is nice feature of Codensity.

So that's half the story. What about categorical composition? That certainly
*does* require analyzing both the left and right hand structures. So don't we
lose all of our speed gains of Codensity with this? Actually, I think not.
Let's look at the code for categorical composition:

In the last line, we apply left0 and right0 to Done, which is how we convert our Codensity version into something we can actually analyze. (This is equivalent to applying a difference list to an empty list.) We then traverse these values in the same way that we did in conduit 1.1 and earlier.

The important difference is how we ultimately finish. The code in question is the Done clause of the goRight's case analysis, namely:

Done r2 -> PipeM (final >> return (rest r2))Notice the usage of rest, instead of what we would have previously done: used the Done constructor. By doing this, we're immediately recreating a Codensity version of our resulting Pipe, which allows us to only traverse our incoming Pipe values once each, and not need to retraverse the outgoing Pipe for future monadic binding.

This trick doesn't just work for composition. There are a large number of functions in conduit that need to analyze a Pipe, such as addCleanup and catchC. All of them are now implemented in this same style.

After implementing this change, the resulting benchmarks look much better. The naive implementation of monte carlo is now quite close to the low-level version (5.28ms vs 3.44ms, as opposed to the original 15ms). Sliding vector is also much better: the unboxed, 1000-size window benchmark went from 7.96ms to 4.05ms, vs a low-level implementation at 1.87ms.

Type-indexed sequencesOne approach that I haven't tried yet is the type-indexed sequence approach from Reflection without Remorse. I still intend to add it to my conduit benchmark, but I'm not optimistic about it beating out Codensity. My guess is that a sequence data type will have a higher constant factor overhead, and based on the way composition is implemented in conduit, we won't get any benefit from avoiding the need to transition between two representations.

Edward said he's hoping to get an implementation of such a data structure into the free package, at which point I'll update my benchmark to see how it performs.

To pursue next: streamProducer, streamConsumer, and moreWhile this round of benchmarking produced some very nice results, we're clearly not yet at the same level as low-level code. My goal is to focus on that next. I have some experiments going already relating to getting conduit to expose stream fusion rules. In simple cases, I've generated a conduit-compatible API with the same performance as vector.

The sticking point is getting something which is efficient not just for functions explicitly written in stream style, but also provides decent performance when composed with the await/yield approach. While the latter approach will almost certainly be slower than stream fusion, I'm hoping we can get it to degrade to current-conduit performance levels, and allow stream fusion to provide a significant speedup when categorically composing two Conduits written in that style.

The code discussed in this post is now available on the next-cps branch of conduit. conduit-extra, conduit-combinators, and a number of other packages either compile out-of-the-box with these changes, or require minor tweaks (already implemented), so I'm hoping that this API change does not affect too many people.

As I mentioned initially, I'd like to have some time for community discussion on this before I make this next release.

### Which docker image for haskell development

Hi, I would like to get started with haskell development and would like to know what kind of docker image would be suitable to have a well-working environment. What kind of images are you using?

submitted by wirrbel[link] [16 comments]