News aggregator

Info about List with terminal element

haskell-cafe - 0 sec ago
Hello, I'm interested in learning more about a list-like structure with a terminal element. I.e. data List t e = Cons e (List t e) | Null t or maybe data List t e = Cons e (List t e) | Null | Terminal t An practical application could be lazy reading from a file readFile :: String -> List Error String Obviously there is a Functor instance. There is even a Monad instance. However, it is not defined as obviously because it's not clear what the terminal element should be. There is a straight forward fold like structure. My questions are. Are there more interesting generalizations of List? What rules do they follow? How should the Monad work? Why? I know this is related to the whole Iteratee discussion (I'm not completely familiar with it). Cheers Silvio
Categories: Offsite Discussion

Incremental unification and program manipulation

Haskell on Reddit - 1 hour 38 min ago

Hi everyone, I would like to discuss the options for this concept - incremental program creation, typechecking, and modification.

For starters, I would like to direct attention to prenex polymorphic / Hindley-Milner type systems like Haskell and PureScript, for brevity. The existing methodology is to create your program with text, then invoke your compiler to attempt type checking from the empty state - unifying inference and type signature claims throughout the whole program, but starting from a blank slate. I believe that an initiative for incremental development may be a strong gain (or at least a fun toy) for systems like Haskell. The idea came to me when I starting thinking about social networks, and the feasibility for users to claim something abstract, but structurally precise - like code or logic, for instance. The issue you immediately run into is code correctness - if you have a swarm of users appending and editing a shared codebase, without locks, then the result will diverge into a garbled mess very quickly.

This got me thinking - mostly about totality and atomic substitutions across a codebase - if I had a data type Foo, and wanted to add a new sum-type data constructor to it, we all know how we would traditionally code this - we would make the change for Foo (in our editor), then make the change for every function involving Foo as an input, to retain totality of functions (pattern matching gets interesting here, with the use of wildcards _).

But, say we instead decided to build a substitution for our code (or really, a lens into our codebase where we modify). Now, we could declare the fields to be added or removed from Foo, and with some magic, we could also induce other required substitutions for functions that depend on Foo, to retain totality.

The semantics for a system like this is fairly interesting, because it's not what is normally common when looking at denotational semantics - usually we address the expression's grammar, or the grammar of a type (like in pure type systems). But in this case, we would need to address the semantics for declaring expressions and types, and show that our substitutions / modifications retain totality.

I think this could be a new design for a Haskell interpreter - the main issue is that our type checking can't be garbage collected in an obvious way. I'm checking out OutsideIn(X) right now, but I'm fairly new to constraint solvers so I can't promise that I'll have a truly incremental version soon. But it would still be interesting to see how we could make a runtime system that's editable, in an atomic way, that is still efficient in typechecking (you don't need to recheck terms that are outside the reach of the modified subject matter), but may not be very efficient operationally (but would be efficient in terms of adjusting the operation at runtime).

I argue that a system like this would be very useful for collaborative logic programming, like a financial trading network, where you would not want to ever take the running system down, but would like to tweak it at any given time, by many different people at the same time. Or really, any network where people could be collaborating in the same codebase. It would be interesting to see groups of people translate something complicated and abstract, like a legal system, into a program that all the experts contribute to.

Does anyone have any ideas for a research project like this? Maybe something that already exists, papers to add to my stockpile, or counter arguments to show where I might be being silly? Any help with my understanding would be greatly appreciated.

submitted by Spewface
[link] [4 comments]
Categories: Incoming News

Run time module definition file location information?

haskell-cafe - 2 hours 4 min ago
Hi all, How can I get a running Haskell program to tell me where it’s looking, in order to satisfy “import …” statements? Thanks, -db
Categories: Offsite Discussion

Slurping Arguments to Constructor in Pattern?

Haskell on Reddit - 3 hours 21 min ago

Would it be consistent with the rest of Haskell to allow matches against multiple arguments in a constructor and pack them into a tuple, as in this little example.

data Foo = Bar Int String String | Baz

The pattern match:

Foo a b c = Foo 1 "string1" "string2"

(Warning: Strawman Syntax) but you can’t write something to the effect of

Foo a (* b) = Foo 1 "string1" "string2"

and have b be assigned to a tuple ("string1", "string2")

I am wondering if this capability exists in a GHC extension somewhere or if functionality like this would conflict with the rest of the language semantics.

My sense is that an explicit constructor Foo in a pattern already nails down the type of the value we are matching against enough that we can infer the type of b in all cases, but I am not convinced that this wouldn't cause other problems. Thoughts?

submitted by small-wolf
[link] [7 comments]
Categories: Incoming News

A simple ncurses "Ultimate Tic Tac Toe" game

Haskell on Reddit - 5 hours 23 min ago


I made a game using UI.NCurses. The game is "Ultimate Tic Tac Toe". I also wrote a blog post about it.

If anyone feels like giving me some feedback, I'd be glad!

Also, just out of curiosity, how would you go about making it work online? Pass the state over tcp with cereal?

submitted by clrnd
[link] [comment]
Categories: Incoming News

Ensuring low memory computation of two functions from one lazy list?

Haskell on Reddit - 8 hours 15 min ago

I have been playing around with computations from on lists and seeing how it gets optimized to not having to create the whole list.

Started with this code:

print $ getSum . foldl' (<>) mempty $ Sum <$> [1..10000000]

That uses the Sum type to sum up 10000000 numbers. I chose to use Sum and not just the simple sum function to make it close to using any other arbitrary map-then-fold functions.

This code works quickly and in O(1) memory usage.

I then made the function a bit more complex, initially trying to use Product, but that turned out to be too time and memory consuming by itself and came to use two sums:

print $ (getSum *** getSum) . foldl' (<>) mempty $ (Sum &&& Sum) <$> [1..10000000]

I had initially assumed, because it's one function here and one function there, that it will also be quick and with O(1) memory usage. Unfortunately, the memory usage explodes. This is likely because the fst element of the tuple is computed, creating the whole set of thunks for the snd rather than doing the computation in parallel.

How can I perform this calculation in O(1) memory like in the original Sum case? I have a feeling strictness and bang patterns could help me.

EDIT: Using

print $ (getSum *** getSum) . foldl' (\a@(!a1, !a2) b -> a <> b) mempty $ (Sum &&& Sum) <$> [1..10000000]

Seems to have solved the issue for this case. I wonder, however, if there's a better approach.

submitted by Soul-Burn
[link] [11 comments]
Categories: Incoming News

something went wrong - 9 hours 17 min ago
Categories: Offsite Blogs

Haskell on Reddit - Tue, 11/24/2015 - 9:04pm

I've been messing around with ways to do type safe routing with dependent types. I thought I'd put up an example of what I'd come up with in case anyone else was interested.

submitted by andrewthad
[link] [5 comments]
Categories: Incoming News

Comparison between fields of each record

haskell-cafe - Tue, 11/24/2015 - 8:20pm
Dear All. I'd like to compare fields of each record. here is my record. data Person = Person {name:: String, age:: Int } deriving(Show) data Relations = Friend | Older | Younger class Comparison a where compare:: a -> a -> Relations instance Comparison Person where compare Person a b Person a b | b1 == b2 = Friend | b1 > b2 = Older | b1 < b2 = Younger How can I fit it? Sincerely, Jeon-Young Kang _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >
Categories: Offsite Discussion

Fast way to serialize Haskell objects

Haskell on Reddit - Tue, 11/24/2015 - 7:58pm

I am using the binary package to write a large Haskell object into and read it from the redis cache. I compress the data using Snappy before writing the bytes into the cache. Snappy compresses the data about 4 to 5 times. It takes about 35 seconds in my case to write the whole thing. I wonder if there is any other serialization package that is faster than binary. Since I am reading the object using the same program, I don't need any extra feature such as interoperability between Haskell and other languages.

I also used protocol buffers to serialize data with no compression. The uncompressed size of data in ProtoBuf is almost the same as the compressed binary. The running time is about the same as well.

submitted by gtab62
[link] [20 comments]
Categories: Incoming News

[ANN] Linode package

haskell-cafe - Tue, 11/24/2015 - 7:51pm
Hello everybody, I'm happy to announce the initial release of the Linode package. It contains bindings to the Linode API and a few user friendly routines to setup and destroy VPS instances. The next major release will cover private IPs and load balancing. Hackage: Github: A working example is provided in the docs. It creates one linode instance and performs a dummy install: Use cases include: - Running some daily batch, with one or several instances - Dynamically adding some resources to a long running process - Anything you would normally do with a VPS Ideas, contributions and feedback are welcome! -- Sebastian de Bellefon _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe< at >
Categories: Offsite Discussion

Douglas M. Auclair (geophf): (Really) Big Data: from the trenches

Planet Haskell - Tue, 11/24/2015 - 7:23pm
Okay, people throw around 'big data' experience, but what is it really like? What does it feel like to manage a Petabyte of data. How do you get your hands around it? What is the magic formula that makes it all work seamlessly without Bridge Lines opened on Easter Sunday with three Vice Presidents on the line asking for status updates by the minute during an application outage?

Are you getting the feel for big data yet?


Big data is not terabytes, 'normal'/SQL databases like Oracle or DB2 or GreenPlum or whatever can manage those, and big data vendors don't have a qualm about handling your 'big data' of two terabytes even though they are scoffing into your purchase order.

"I've got a huge data problem of x terabytes."

No, you don't. You think you do, but you can manage your data just fine and not even make Hadoop hiccough.

Now let's talk about big data.

1.7 petabytes
2.5 billion transactions per day.
Oh, and growing to SIX BILLION transactions per day.

This is my experience. When the vendor has to write a new version of HBase because their version that could handle 'any size of data, no matter how big' crashed when we hit 600 TB?

Yeah. Big data.

So, what's it like?

Storage Requirements/Cluster Sizing
1. Your data is bigger than you think it is/bigger than the server farm you planned for it.

Oh, and 0. first.

0. You have a USD million budget ... per month.

Are you still here? Because that's the kind of money you have to lay out for the transactional requirements and storage requirements you're going to need.

Get that lettuce out.

So, back to 1.

You have this formula, right? from the vendor that says: elastic replication is at 2.4 so for 600 TB you need 1.2 Petabytes of space.


Wrong. Wrong. WRONG.

First: throw out the vendors' formulae. They work GREAT for small data in the lab. They suck for big data IRL.

Here's what happens in industry.

You need a backup. You make a backup. A backup is the exact same size as your active HTables, because the HTables are in bz2-format already compressed.

Double the size of your cluster for that backup-operation.

Not a problem. You shunt that TWO PETABYTE BACKUP TO AWS S3?!?!?

Do you know how long that takes?

26 hours.

Do you know how long it takes to do a restore from backup?

Well, boss, we have to load the backup from S3. That will take 26 hours, then we ...

Boss: No.

me: What?

Boss: No. DR ('disaster recovery') requires an immediate switch-over.

Me: well, the only way to do that is to keep the backup local.

Boss: Okay.

Double the size of your cluster, right?


What happens if the most recent backup is corrupted, that is, today's backup, because you're backing up every day just before the ETL-run, then right after the ETL-run, because you CANNOT have data corruption here, people, you just can't.

You have to go to the previous backup.

So now you have two FULL HTable backups locally on your 60-node cluster!

And all the other backups are shunted, month-by-month, to AWS S3.

Do you know how much 2 petabytes, then 4 petabytes, then 6 petabytes in AWS S3 costs ... per month?

So, what to do then?

You shunt the 'old' backups, older than x years old, every month, to Glacier.

Yeah, baby.

That's the first thing: your cluster is 3 times the size of what it needs to be, or else you're dead, in one month. Personal experience bears this out: first you need the wiggle room or else you stress out your poor nodes of your poor cluster, and you start getting HBase warnings and then critical error messages about space utilization, second, you need that extra space when the ETL job loads in a billion row transaction of the 2.5 billion transactions you're loading in that day.

Been there. Done that.

Disaster Recovery
Okay, what about that DR, that Disaster Recovery?

Your 60 node cluster goes down, because, first, you're not an idiot and didn't build a data center and put all those computers in there yourself, but shunted all that to Amazon and let them handle that maintenance nightmare.

Then the VP of AWS Oregon region contacts you and tells you everything's going down in that region: security patch. No exceptions.

You had a 24/7 contract with 99.999% availability with them.

Sorry, Charlie: you're going down. A hard shutdown. On Thursday.

What are you going to do?

First, you're lucky if Amazon tells you: they usually just do it and let you figure that out on your own. So that means you have to be ready at any time for the cluster to go down with no reason.

We had two separate teams monitoring our cluster: 24/7. And they opened that Bridge Line the second a critical warning fired.

And if a user called in and said the application was non-responsive?

Ooh, ouch. God help you. You have not seen panic in ops until you see it when one user calls and come to find it's because the cluster is down with no warning catching that.

Set up monitoring systems on your cluster. No joke.

With big data, your life? Over.

Not an issue. Or, it becomes an issue when you're shunting your backup to S3 and the cluster gets really slow. We had 1600 users that we rolled out to, we stress-tested it, you know. Nobody had problems during normal operations, it's just that when you ask the cluster to do something, like ETL or backup-transfer, that engages all disks of all nodes in reads and writes.

A user request hits all your region servers, too.

Do your backups at 2 am or on the weekends. Do your ETL after 10 pm. We learned to do that.

Amazon is perfect; Amazon is wonderful; you'll never have to maintain nor monitor your cluster again! It's all push-of-the-button.

I will give Amazon this: we had in-house clusters with in-house teams monitoring our clusters, 'round the clock. Amazon made maintenance this: "Please replace this node."

Amazon: "Done."

But you can't ask anything other than that. Your data on that node? Gone. That's it, no negotiations. But Hadoop/HBase takes care of that for you, right? So you're good, right?

Just make sure you have your backup/backout/DR plans in place and tested with real, honest-to-God we're-restarting-the-cluster-from-this-backup data or else you'll never know until you're in hot water.

Every vendor will promise you the Moon ... and 'we can do that.' Every vendor believes it.

Then you find out what's what. We did. Multiple times, multiple vendors. Most can't handle our big data when push came to shove, even though they promised they can handle data of any size. They couldn't. Or they couldn't handle it in a manageable way: if the ETL process takes 26 hours and it's daily, you're screwed. Our ETL process got down to 1.5 hours, but that was after some tuning our their part and on ours: we had four consultants from the vendor in-house every day for a year running. Part of our contract-agreement. If you are blazing the big data trail, your vendor is, too: we were inventing stuff on the fly just to manage the data coming in, and to ensure the data came out in quick, responsive ways.

You're going to have to do that, too, with real big data, and that costs money. Lots.

And, ... but it also costs cutting through what vendors are saying to you, and what their product can actually handle. Their sales people have their sales-pitch, but what really happened is we had to go through three revisions of their product just so it could be an Hadoop HBase-compilant database that could handle 1.7 petabytes of data.

That's all.

Oh, and grow by 2.5 billion rows per day.

Which leads to ...

Backout/Aging Data
Look, you have big data. Some of it's relevant today, some of it isn't. You have to separate the two, clearly and daily, if you're not, then a month, two months, two years down the road you're screwed, because you're now dealing with a full-to-the-gills cluster AND having to disambiguate data you've entangled, haven't you? with the promise of looking at aging data gracefully ... 'later.'

Well, later is right now, and your cluster is full and in one month it's going critical.

What are you going to do?

Have a plan to age data. Have a plan to version data. Have a data-correction plan.

These things can't keep being pushed off to be considered 'later' because 'later' will be far too late, and you'll end up crashing your cluster (bad) or corrupting your data when you slice and dice it the wrong way, come to find (much, much worse). Oh, and version your backups, tying them to the application version, because when you upgrade your application, your data gets all screwy, being old, or your new data format on your old application when somebody pulls up a special request to view three-year-old data is all screwy.

Have a very clear picture of what your users need, the vast majority of the time, and deliver that and no more.

We turned a 4+hour query that terminated when it couldn't deliver a 200k+ row query on GreenPlum...

Get that? 4+hours to learn your query failed.

No soup for you.

To a 10 second query against Hadoop HBase that returns 1M+ rows.

Got that?

We changed peoples' lives. What was impossible before for our 1600 users was now in hand in 10 seconds.

But why?

Because we studied all their queries.

One particular query was issued 85% of the time.

We built our Hadoop/HBase application around that, and shunted the other 15% of the queries other tools that could manage that load.

Also, we studied our users: all their queries were in transactions of within the last month.

We kept two years of data on-hand.


And that two years grew to more, month by month.


We had no graceful data aging/versioning/correcting plans, so, 18 months into production we were faced with a growing problem.

Growing daily.

The users do queries up to a month? No problem: here's your data in less than 10 seconds, guaranteed. You want to do research, you put in a request.

Your management has to put their foot down. They have to be very clear what this new-fangled application is delivering and the boundaries on what data they get.

Our management did, for the queries, and our users loved us. You put in a query and it takes four hours, and only 16 queries are allowed against the system to run at any one time to: anyone, anywhere can submit a query and it returns right away?

Life-changing, and we did psychological studies as well as user-experience studies, too, so I'm not exaggerating.

What our management did not do is put bounds on how far back you could go into the data set. The old application had a 5 year history, so we thought two years was good. It wasn't. Everybody only queried on today, or yesterday, or, rarely: last week or two weeks ago. We should have said: one month of data. You want more, submit a request to defrost that old stuff. We didn't and we paid for it in long, long meetings around the problem of how to separate old data from new and what to do to restore old data, if, ever (never?) a request for old data came. If we had a monthly shunt to S3 then to Glacier, that would have been a well-understood and automatic right-sizing from the get-go.

You do that for your big data set.

Last Words
Look. There's no cookbook or "Big Data for Dummies" that is going to give you all the right answers. We had to crawl through three vendors to get to one who didn't work out of the box but who could at least work with us, night and day, to get to a solution that could eventually work with our data set. So you don't have to do that. We did that for you.

You're welcome.

But you may have to do that because you're using Brand Y not our Brand X or you're using Graph databases, not Hadoop, or you're using HIVE or you're using ... whatever. Vendors think they've seen it all, and then they encounter your data-set with its own particular quirks.

Maybe, or maybe it all will magically just work for you.

And let's say it does all magically work, and let's say you've got your ETL tuned, and your HTables properly structured for fast in-and-out operations.

Then there's the day-to-day daily grind of keeping a cluster up and running. If your cluster is in-house ... good luck with that. Have your will made out and ready for when you die from stress and lack of sleep. If your cluster is from an external vendor, just be ready for the ... eh ... quarterly, at least, ... times they pull the rug out from under you, sometimes without telling you and sometimes without reasonably fair warning time, so it's nights and weekends for you to prep with all hands on deck and everybody looking at you for answers.

Then, ... what next?

Well: you have big data? It's because you have Big Bureaucracy. The two go together, invariably. That means your Big Data team is telling you they're upgrading from HBase 0.94 to HBase whatever, and that means all your data can go bye-bye. What's your transition plan? We're phasing in that change next month.

And then somebody inserts a row in the transaction, and it's ... wrong.

How do you tease a transaction out of an HTable and correct it?

An UPDATE SQL statement?

Hahaha! Good joke! You so funny!

Tweep: "I wish twitter had an edit function."

Me: Hahaha! You so funny!

And, ooh! Parallelism! We had, count'm, three thousand region servers for our MapReduce jobs. You got your hands around parallelism? Optimizing MapReduce? Monitoring the cluster as the next 2.5 billion rows are processed by your ETL-job?

And then a disk goes bad, at least once a week? Stop the job? Of course not. Replace the disk (which means replacing the entire node because it's AWS) during the op? What are the impacts of that? Do you know? What if two disks go down during an op?

Do you know what that means?

At replication of 2.4, two bad disks means one more disk going bad will get you a real possibility of data corruption.

How's your backups doing? Are they doing okay? Because if they're on the cluster now your backups are corrupted. Have you thought of that?

Think about that.

And, I think I've given enough experience-from-the-trenches for you to think on when spec'ing out your own big data cluster. Go do that and (re)discover these problems and come up with a whole host of fires you have to put out on your own, too.

Hope this helped. Share and enjoy.

cheers, geophf
Categories: Offsite Blogs

Enable Profiling in Stack?

Haskell on Reddit - Tue, 11/24/2015 - 5:43pm

We're finally upgrading our dev environment. Stack seems pretty use, but I can't figure out how to build to enable +RTS -prof. Using --executable-profiling does something (at least, it takes a very longtime recompiling every library), but trying -prof still complains, so I'm not really sure what it's doing. Anyone got this working?

submitted by dogodel
[link] [4 comments]
Categories: Incoming News