News aggregator

Question about stacks in Haskell (and Rust)

Haskell on Reddit - Thu, 01/30/2014 - 9:46pm

A few months ago, dynamically sized stacks (that grow and shrink as needed) were removed from Rust. One of the reasons cited was performance, especially the cost of repeatedly allocating/deallocating segments.

As far as I know, the GHC runtime uses growable stacks as well – but I've never heard complaints about Haskell threads running too slow. Quite the opposite, actually.

So why are dynamically sized stacks suitable for Haskell, but not Rust?

submitted by lfairy
[link] [9 comments]
Categories: Incoming News

Mark Jason Dominus: Twingler, a generic data structure munger

Planet Haskell - Thu, 01/30/2014 - 9:12pm

(Like everything else in this section, these are notes for a project that was never completed.)

Introduction

These incomplete notes from 1997-2001 are grappling with the problem of transforming data structures in a language like Perl, Python, Java, Javascript, or even Haskell. A typical problem is to take an input of this type:

[ [Chi, Ill], [NY, NY], [Alb, NY], [Spr, Ill], [Tr, NJ], [Ev, Ill], ]

and to transform it to an output of this type:

{ Ill => [Chi, Ev, Spr], NY => [Alb, NY], NJ => [Tr], }

One frequently writes code of this sort, and it should be possible to specify the transformation with some sort of high-level declarative syntax that is easier to read and write than the following gibberish:

my $out; for my $pair (@$in) { push @{$out->{$pair->[0]}}, $pair->[1]; } for my $k (keys %$out) { @{$out->{$k}} = sort @{$out->{$k}}; }

This is especially horrible in Perl, but it is bad in any language. Here it is in a hypothetical language with a much less crusty syntax:

for pair (in.items) : out[pair[0]].append(pair[1]) for list (out.values) : list.sort

You still can't see what it really going on without executing the code in your head. It is hard for a beginner to write, and hard to anyone to understand.

Original undated notes from around 1997–1998

Consider this data structure DS1:

[ [Chi, Ill], [NY, NY], [Alb, NY], [Spr, Ill], DS1 [Tr, NJ], [Ev, Ill], ]

This could be transformed several ways:

{ Chi => Ill, NY => NY, Alb => NY, Spr => Ill, DS2 Tr => NJ, Ev => Ill, } { Ill => [Chi, Spr, Ev], NY => [NY, Alb], DS3 NJ => Tr, } { Ill => 3, NY => 2, NJ => 1, } [ Chi, Ill, NY, NY, Alb, NY, Spr, Ill, Tr, NJ, Ev, Ill] DS4

Basic idea: Transform original structure of nesting depth N into an N-dimensional table. If Nth nest is a hash, index table ranks by hash keys; if an array, index by numbers. So for example, DS1 becomes

1 2 1 Chi Ill 2 NY NY 3 Alb NY 4 Spr Ill 5 Tr NJ 6 Ev Ill

Or maybe hashes should be handled a little differently? The original basic idea was more about DS2 and transformed it into

Ill NY NJ Chi X NY X Alb X Spr X Tr X Ev X

Maybe the rule is: For hashes, use a boolean table indexed by keys and values; for arrays, use a string table index by integers.

Notation idea: Assign names to the dimensions of the table, say X and Y. Then denote transformations by:

[([X, Y])] (DS1) {(X => Y)} (DS2) {X => [Y]} (DS3) [(X, Y)] (DS4)

The (...) are supposed to incdicate a chaining of elements within the larger structure. But maybe this isn't right.

At the bottom: How do we say whether

X=>Y, X=>Z

turns into

[ X => Y, X => Z ] (chaining)

or [ X => [Y, Z] ] (accumulation)

Consider

A B C D . . . E . . F . .

<...> means ITERATE over the thing inside and make a list of the results. It's basically `map'.

Note that:

<X,Y> |= (D,A,D,B,D,C,E,A,E,B,F,A,F,C) <X,[<Y>]> |= (D,[A,B,C],E,[A,B],F,[A,C])

Brackets and braces just mean brackets and braces. Variables at the same level of nesting imply a loop over the cartesian join. Variables subnested imply a nested loop. So:

<X,Y> means for x in X for y in Y push @result, (x,y) if present(x,y);

But

<X,<Y>> means for x in X for y in Y push @yresult, (y) if present(x,y); push @result, @yresult

Hmmm. Maybe there's a better syntax for this.

Well, with this plan:

DS1: [ <[X,Y]> ] DS2: { <X=>Y> } DS3: { <X => [<Y>]> } DS4: [ <X, Y> ]

It seems pretty flexible. You could just as easily write

{ <X => max(<Y>) }

and you'd get

{ D => C, E => B, F => C }

If there's a `count' function, you can get

{ D => 3, E => 2, F => 2 }

or maybe we'll just overload scalar' to meancount'.

Question: How to invert this process? That's important so that you can ask it to convert one data structure to another. Also, then you could write something like

[ <city, state> ] |= { <state => [<city>] > }

and omit the X's and Y's.

Real example: From proddir. Given

ID / NAME / SHADE / PALETTE / DESC

For example:

A / AAA / red / pink / Aaa B / BBB / yellow / tawny / Bbb A / AAA / green / nude / Aaa B / BBB / blue / violet / Bbb C / CCC / black / nude / Ccc

Turn this into

{ A => [ AAA, [ [red, pink], [green, nude] ], Aaa], B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb], C => [ CCC, [ [black, nude] ], CCC] } { < ID => [ name, [ <[shade, palette]> ] desc ]> }

Something interesting happened here. Suppose we have

[ [A, B] [A, B] ]

And we ask for <A, B>. Do we get (A, B, A, B), or just (A, B)? Does it remove duplicate items for us or not? We might want either.

In the example above, why didn't we get

{ A => [ AAA, [ [red, pink], [green, nude] ], Aaa], A => [ AAA, [ [red, pink], [green, nude] ], Aaa], B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb], B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb], C => [ CCC, [ [black, nude] ], CCC] }

If the outer iteration was supposed to be over all id-name-desc triples? Maybe we need

<...> all triples <!...!> unique triples only

Then you could say

<X> |= <!X!>

to indicate that you want to uniq a list.

But maybe the old notation already allowed this:

<X> |= keys %{< X => 1 >}

It's still unclear how to write the example above, which has unique key-triples. But it's in a hash, so it gets uniqed on ID anyway; maybe that's all we need.

1999-10-23

Rather than defining some bizarre metalanguage to describe the transformation, it might be easier all around if the user just enters a sample input, a sample desired output, and lets the twingler figure out what to do. Certainly the parser and internal representation will be simpler.

For example:

[ [ A, B ], [ C, B ], [ D, E ] ] -------------- { B => [A, C], E => [D], }

should be enough for it to figure out that the code is:

for my $a1 (@$input) { my ($e1, $e2) = @$a1; push @{$output{$e2}}, $e1; }

Advantage: After generating the code, it can run it on the sample input to make sure that the output is correct; otherwise it has a bug.

Input grammar:

%token ELEMENT expr: array | hash ; array: '[' csl ']' ; csl: ELEMENT | ELEMENT ',' csl | /* empty */ ; hash: '{' cspl '}' ; cspl: pair | pair ',' cspl | /* empty */ ; pair: ELEMENT '=>' ELEMENT;

Simple enough. Note that (...) lines are not allowed. They are only useful at the top level. A later version can allow them. It can replace the outer (...) with [...] or {...] as appropirate when it sees the first top-level separator. (If there is a => at the top level, it is a hash, otherwise an array.)

Idea for code generation: Generate pseudocode first. Then translate to Perl. Then you can insert a peephole optimizer later. For example

foreachkey k (somehash) { push somearray, $somehash{k} }

could be optimized to

somearray = values somehash;

add into hash: as key, add into value, replace value add into array: at end only

How do we analyze something like:

[ [ A, B ], [ C, B ], [ D, E ] ] -------------- { B => [A, C], E => [D], }

Idea: Analyze structure of input. Analyze structure of output and figure out an expression to deposit each kind of output item. Iterate over input items. Collect all input items into variables. Deposit items into output in appropriate places.

For an input array, tag the items with index numbers. See where the indices go in the output. Try to discern a pattern. The above example:

Try #1: A: 1 B: 2 C: 1 B: 2 -- consistent with B above D: 1 E: 2 Output: 2 => [1, 1] 2 => [1]

OK—2s are keys, 1s are array elements.

A different try fails:

A: 1 B: 1 C: 2 B: 2 -- inconsistent, give up on this.

Now consider:

[ [ A, B ], [ C, B ], [ D, E ] ] -------------- { A => B, C => B, D => E, }

A,C,D get 1; B,E get 2. this works again. 1s are keys, 2s are values.

I need a way of describing an element of a nested data structure as a simple descriptor so that I can figure out the mappings between descriptors. For arrays and nested arrays, it's pretty easy: Use the sequence of numeric indices. What about hashes? Just K/V? Or does V need to be qualified with the key perhaps?

Example above:

IN: A:11 B:12 22 C:21 D:31 E:32 OUT: A:K B:V C:K D:K E:V

Now try to find a mapping from the top set of labels to the bottom. x1 => K, x2 => V works.

Problem with this:

[ [ A, B ], [ B, C ], ] ------------ { A => B, B => C, }

is unresolvable. Still, maybe this works well enough in most common cases.

Let's consider:

[[ A , AAA , red , pink , Aaa], [ B , BBB , yellow , tawny , Bbb], [ A , AAA , green , nude , Aaa], [ B , BBB , blue , violet , Bbb], [ C , CCC , black , nude , Ccc], ] ------------------------------------------------------------- { A => [ AAA, [ [red, pink], [green, nude] ], Aaa], B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb], C => [ CCC, [ [black, nude] ], CCC] } A: 00,20 => K AAA: 01,21 => V0 red: 02 => V100 pink: 03 => V101 Aaa: 04 => V2 B: 10,30 => K C: 40 => K

etc.

Conclusion: x0 => K; x1 => V0; x2 => V100; x3 => V101; x4 => V2

How to reverse?

Simpler reverse example:

{ A => [ B, C ], E => [ D ], } --------------------- [ [ A, B ], [ A, C ], [ E, D ], ] A: K => 00, 10 B: V0 => 01 C: V1 => 11 D: V0 => 21 E: K => 20

Conclusion: K => x0; V => x1 This isn't enough information. Really, V => k1, where k is whatever the key was!

What if V items have the associated key too?

A: K => 00, 10 B: V{A}0=> 01 C: V{A}1=> 11 D: V{E}0=> 21 E: K => 20

Now there's enough information to realize that B amd C stay with the A, if we're smart enough to figure out how to use it.

2001-07-28

Sent to Nyk Cowham

2001-08-24

Sent to Timur Shtatland

2001-10-28

Here's a great example. The output from HTML::LinkExtor is a list like

([img, src, URL1, losrc, URL2], [a, href, URL3], ... )

we want to transform this into

(URL1 => undef, URL2 => undef, URL3 => undef, ... )
Categories: Offsite Blogs

Deprecating bytestring version 0.10.2.0

haskell-cafe - Thu, 01/30/2014 - 8:01pm
Hi everyone, Following some problems I experienced with Network.HTTP.Client.TLS, I found out the problem was specific to bytestring library version 0.10.2.0 and most likely fixed with this commit https://github.com/haskell/bytestring/commit/86df1f17b2332940df69f484182c5c2cdd4c5bec . The default version shipped with GHC 7.6.3 (0.10.0.2) works fine, and the version after 0.10.2.0 (that is, version 0.10.4.0) also works. It was suggested that I mention on here that I think we should deprecate the 0.10.2.0 version of bytestring specifically, since that is the one that introduces the problem, and there can be problems with having to specify specific versions of bytestring as a dependency when compiling across GHC versions.
Categories: Offsite Discussion

Naming scheme for partial functions

haskell-cafe - Thu, 01/30/2014 - 7:33pm
Greg Weber and I have been discussing some changes to mono-traversable[1]. One of the modules we provide is Data.NonNull, which provides total versions of functions like `last`. A change we're looking at would require having a partial version of `last` defined in a separate typeclass (IsSequence), which would allowing for more optimized implementations of the total `last` function for datatypes which support it (e.g., strict ByteStrings). But what should we name it? I'm sure everyone's familiar with the `unsafe` naming convention, but that's not appropriate here: standard usage shows `unsafe` meaning a function which can cause a segfault. I initially named it `partialLast`, but partial can also imply partial function application. Greg brought up the idea of suffixing the function with something like `Throws` or `Errors`, which I think I'm a bit partial to myself[2]. So my questions are: * Is there some already used naming scheme out there for partial functions which I've missed? * Do people have any idea
Categories: Offsite Discussion

Unmaintained packages and hackage upload rights

haskell-cafe - Thu, 01/30/2014 - 6:30pm
In the recent past I took over two unmaintained packages: bert and ansi-terminal. I don't mind spending a bit of time to keep our ecosystem from bitrotting. However, both times I had to go through an irritating procedure of contacting hackage admins, asking them to grant me upload rights, explaining why the maintainers can't do that themselves and why I think the packages are abandoned. Instead of a feeling that I'm doing something good and useful, I have a feeling that I'm bothering people with my own problems. It also adds unnecessary latency to my work. So from now on I'll simply fork the packages I need to fix. Others are of course welcome to use my forks. (This email was prompted by regex-tdfa which doesn't build on GHC 7.8, and whose maintainer hasn't responded. My fork is at http://hackage.haskell.org/package/regex-tdfa-rc .) Roman _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Categories: Offsite Discussion

blog.fogus.me

del.icio.us/haskell - Thu, 01/30/2014 - 5:35pm
Categories: Offsite Blogs

www.serpentine.com

del.icio.us/haskell - Thu, 01/30/2014 - 5:35pm
Categories: Offsite Blogs

blog.fogus.me

del.icio.us/haskell - Thu, 01/30/2014 - 5:35pm
Categories: Offsite Blogs

www.serpentine.com

del.icio.us/haskell - Thu, 01/30/2014 - 5:35pm
Categories: Offsite Blogs

Reading about Haskell "design patterns"?

Haskell on Reddit - Thu, 01/30/2014 - 4:54pm

When reading an article about using monoids to model a fizzbuzz solution I'm just impressed by the clever solution presented at the end. I would never come up with that off the top of my head. Recently, I also had reason to look for a function Maybe a -> (a -> IO b) -> IO (Maybe b), which seemed to me like a thing that's rather common. Someone in #haskell helpfully pointed out that's just for from traversable.

This made me realise there are probably a bunch of design patterns that can be modeled with the common type classes and used to handle errors and make Haskell code really beautiful, and I don't know the first thing about that.

I would really like to level up my Haskell knowledge by reading about how you can use the common type classes* to capture common patterns in code. Is there anything like that anywhere around?

Please note that I'm not interested in explanations on how the type classes work, or any of the theoretical underpinnings. I consider myself relatively familiar with how they work. I'm just curious in which ways they can be used to make code better.

A lot of the error handling I do in Haskell code is still pattern matching repeatedly (and sometimes more deeply nested than I care to admit) and it feels like I'm missing some integral understanding to make my programs more declarative, and to make the common type classes take care of my errors better for me.

* Defined somewhat loosely as functors, applicatives, monads, monoids, traversables, foldables, arrows, and whatnot.

submitted by kqr
[link] [14 comments]
Categories: Incoming News

Newtype for custom instance, how to reduce boilerplate

Haskell on Reddit - Thu, 01/30/2014 - 3:11pm

I would like to have a custom TokenParsing instance, no I created a newtype, and rewrote all the instances that I needed.

It is pretty easy to see this is mostly boilerplate. Is there a nicer way to achieve this result ?

submitted by bartavelle
[link] [10 comments]
Categories: Incoming News

8 Funded PhD Positions at St Andrews

General haskell list - Thu, 01/30/2014 - 2:45pm
[Please forward to suitable candidates. Thanks! Kevin.] We have eight funded research scholarships available under the University of St Andrews Seventh Century scheme. These are open to students from any country, and pay fees as well as maintenance. The deadline for applications is March 31st 2014, but it's good to apply earlier. We have an active group of about ten academic academics, postdoctoral research fellows and postgraduate students working on a variety of topics. We'd welcome applicants who are interested in any aspect of functional programming and related areas, including: Parallel Functional programming; Heterogeneous multicores (including CPU/GPU combinations); Refactoring; Patterns of computation; Machine-Learning; Compilation; Real-time functional programming (e.g. in Hume); Semantics of Programming Languages; Functional cloud computing; Functional Programming and Security; Dependent Types; Effects and other extra-functional properties; Relaxed memory consistency; Multicore programming;
Categories: Incoming News

Qualification regions

Haskell on Reddit - Thu, 01/30/2014 - 2:45pm

Do you think something like this would be useful/good? If an identifier baz is ambiguous in an assume foo.bar block, and one of the possibilities is foo.bar.baz, assume it's that one.

Example:

module MyNum where import Data.Vector import Data.ByteString assume Data.Vector where doStuffWithVectors = map (+2) -- map could be from Prelude, Vector, or ByteString, assume it's the Vector one assume Data.ByteString where doStuffWithByteStrings = foldr (+) 0 -- assume it's ByteString's foldr (+) :: (ExpLike e, ExpLike e') => e t -> e' t -> Exp t (-) :: (ExpLike e, ExpLike e') => e t -> e' t -> Exp t assume MyNum where baz = exp1 + exp3 submitted by vahokif
[link] [11 comments]
Categories: Incoming News

language-puppet: Version 0.11.0 - now with a full serving of unsafeCoerce

Planet Haskell - Thu, 01/30/2014 - 2:15pm

A new version is out, with a focus on the bleeding edge ! The focus of this release was to update important dependencies, so we have now :

  • lens 4
  • attoparsec 0.11
  • aeson 0.7
  • text up to 1.1
  • parsers 0.10
  • and new versions of filecache and hruby

The goal here is to reap the benefits of the speed improvements in attoparsec, text and aeson, and be ready for the GHC 7.8 release.

There were a few hurdles to get there. The move from Data.Attoparsec.Number to Data.Scientific for number representation was not a big problem (even though code in hruby could be polished), but the lens and parsers upgrade proved more problematic.

Lens related breakage

The main problem with lens 4 was that it broke strict-base-types. I don’t think this will last long, but here is a temporary workaround for the impatient. Other than that, several instances of x ^. contains y were replaced by x & has (ix y) for the map-like types. This was a painless upgrade.

The trouble with parsers

I like this package a lot, because it exposes a nice API and supports several underlying parsers. But it comes with a couple problems.

The first one is related to the TokenParsing typeclass. This let you define how someSpace, the function that skip spaces, is defined. Unfortunately, the parsers library comes with an instance for parsec that will skip the characters satisfying isSpace. While this certainly is a sane choice, this is a problem for people who would like to use parsec as the underlying parser, but with a different implementation of someSpace. In my case, I also wanted to skip single line (start with #) and multi-line (/* these */) comments. A solution is to create a newtype, and redefine all instances. For those wondering how this is done, here is the relevant part. Please let me know if there is a way to do that that does not require that much boilerplate.

The second problem is that the expression parser builder (at Text.Parser.Expression) is much slower than what is in parsec. Switching to it from Text.Parsec.Expr resulted in a 25% slowdown, so I switched back to parsec. Unfortunately, I didn’t immediately realize this was the culprit, and instead believed it a case newtypes lowering performance. My code is now littered with unsafeCoerces, that I will remove in the next version (provided this does not result in a performance hit).

New features of previous versions I did not blog about
  • Several bugs were solved thanks to Pi3r.
  • Several new facts were added.
  • A parsing bug was fixed (it was also fixed in the official documentation.
  • An ‘all nodes’ testing mode was added for puppetresources. This can be used that way :
<figure class="code">1 2 3 4 5 6 7 8 9 10 11 $ puppetresource -p . -o allnodes +RTS -N4 Problem with workstation.site : template error for userconfig/signature.erb : undefined method `[]' for nil:NilClass (erb):5:in `get_binding' /home/bartavelle/.cabal/share/x86_64-linux-ghc-7.6.3/language-puppet-0.11.0/ruby/hrubyerb.rb:46:in `get_binding' /home/bartavelle/.cabal/share/x86_64-linux-ghc-7.6.3/language-puppet-0.11.0/ruby/hrubyerb.rb:68:in `runFromContent' /home/bartavelle/.cabal/share/x86_64-linux-ghc-7.6.3/language-puppet-0.11.0/ruby/hrubyerb.rb:63:in `runFromFile' in ./modules/userconfig/templates/signature.erb at # "./modules/userconfig/manifests/init.pp" (line 33, column 9) Problem with db.dev.site : The following parameters are unknown: (use_ramdisk) when including class percona at # "./manifests/site.pp" (line 956, column 10) Problem with db2.dev.site : The following parameters are unknown: (use_ramdisk) when including class percona at # "./manifests/site.pp" (line 969, column 9) Tested 54 nodes.</figure>

This should provide a nice overview of the current state of your manifests. And it tested those nodes about 50 times faster than Puppet can compile the corresponding catalogs.

Behind the scene changes

The Puppet.Daemon machinery has been simplified a lot. It previously worked by spawning a pre-determined amount of threads specialized for parsing or compiling catalogs. The threads communicated with each other using shared Chans. The reason was that I wanted to control the memory usage, and didn’t want to have too many concurrent threads at the same time. It turns out that most memory is used by the parsed AST, which is shared using the (filecache)[http://hackage.haskell.org/package/filecache] module, so this is not a real concern.

I did rip all that, and now the only threads that is spawned is an OS thread for the embedded Ruby interpreter, and an IO thread for the filecache thread. The user of the library can then spawn as many parallel threads as he wants. As a result, concurrency is a bit better, even though there are still contention points :

  • The parsed file cache is held by the filecache thread, and communicates with a Chan. I will eventually replace this with an MVar, or some other primitive that doesn’t require a dedicated thread.
  • The Lua interpreter requires a LuaState, that should not be used by several threads at once. It is stored in a shared MVar.
  • The Ruby interpreter is the main performance bottleneck. It is single threaded, and very slow. The only way to speed it up would be to parse more of the ruby language (there is a parser for common Erb patterns included !), or to switch to another interpreter that would support multithreading. Both are major endeavors.
Waiting around the corner

The next releases will probably be Ruby 2.1 support for hruby, and some performance work on filecache.

Categories: Offsite Blogs

xmonad/osxmonad

del.icio.us/haskell - Thu, 01/30/2014 - 9:54am
Categories: Offsite Blogs