I'm thinking of building a system that essentially computes the values of certain complicated functions for certain input values. (I'm using the term 'function' in the general mathematical sense.) I'm new to Haskell but the strong typing, type inference and functional nature of Haskell prompted me to think it might be the right tool. I'm seeking advice on which aspects of Haskell or which libraries might be useful for building this.
The features of the function computation that I want to perform are:
I'm computing values of some 'ultimate' function for certain input values. The output value of the function is generally structured numeric data, e.g. a 2D matrix with labeled columns together with a few scalars, but it could also be as simple as a simple scalar.
The 'ultimate' function is defined in terms of other functions, which in turn can be defined in terms of other functions. There is no direct or indirect recursion in the function definitions. Each of these functions, including the 'ultimate' function, can be realized by running some executable binary with certain inputs, or using some matrix library to perform some matrix operations. There are functions that are not defined in terms of other functions and they are computed by running some executable binary with corresponding input parameters over some pre-existing, fixed, large, dataset.
For example, with ultimate function F_0 defined as: F_0(x, y, z) = E(F_1(x, y), F_2(y, z)), the computation of F_0(x, y, z) will involve computing F_1(x, y) and F_2(y, z), and then E using their outputs as input. E is computed using an external program that is given the output of F_1 and F_2. F_1 and F_2 in turn, can be defined in terms of other functions.
The output of some non-ultimate function computation could be used multiple times in the entire computation, so for performance they need to be memoized. Due to the data volume, they don't fit in memory and have to be stored on secondary storage as files or in a database. Even the function values for the ultimate function can be memoized so that after an initial computation, future queries will just be lookups.
I want to compute the function values in a distributed fashion maximizing utility of the processor cores. I might even want to distribute the computation across machines, but that's not as important for now.
For certain functions, it is cheaper to compute the function values for multiple similar inputs together. E.g. if X is represented by many bytes of data, it may be cheaper to compute E(0, X), E(1, X), ..., E(n, X) together instead of separately because reading X is expensive.
If F_i(x, y) is defined in terms of F_j(x), F_i(x, y) cannot be computed until the value of F_j(x) has been computed and memoized. Hence, some sort of scheduling of the function computation is necessary that has dependency logic, and for acceptable performance, it may have to consider Point 6.
Changing the definition of the functions should be relatively easy and not involve big code changes that take a long time.
What could be some useful ways to think of the design for this system? I already have a way to do this in Python but every time I change the definitions I need to spend a lot of time making Python code changes. In the existing system, there is already a way to store and retrieve the results of function computation. Essentially, I compute hash keys from function input arguments and store the function values in the filesystem or in a database under those keys.submitted by Syncopat3d
[link] [3 comments]
AFAIK, updating package list with cabal is just downloading the whole 00-index.tar.gz file, which usually takes minutes or so here. Is it so wise for cabal to download them all in every update?submitted by eccstartup
[link] [1 comment]
There's an interface I'm interested in supporting, which seems to me to sit in some weird middle-ground between pure and unpure interfaces.
I don't know of a better name so I'll call it a "consistent map" which stores key/value pairs. (The space of values is potentially much larger than the space of keys.) The consistent map has a pretty simple interface:consistentMapStore :: Value -> Key consistentMapGet :: Key -> Value
It has a guarantee that consistentMapStore v always returns the same Key for identical values of v (as pure functions should).
It also has the guarantee that if k = consistentMapStore v then consistentMapGet k == v; however, this only holds for k that are obtained from consistentMapStore first. To call consistentMapGet otherwise results in undefined behavior.
One possible implementation of this would be to have
consistentMapStore v generate the key by a cryptographically secure hash of v and store (k, v) in a dictionary.
consistentMapGet k simply looks up the hash value in the dictionary.
Another possible implementation:
- consistentMapStore v first checks is v is in the dictionary and if so returns the key; otherwise, it generates a previously-unused key k by incrementing a counter and then it stores (k , v) in the dictionary.
Anyway, it's pretty clear, I think, that you can't implement this without unsafePerformIO or friends, since it has to store the table. That said, it is still pure in the sense that (over the course of a single execution of your program) each function will always return the same value for the same input, provided the interface is used correctly.
So... I'm not really sure what my question here is, but I'm generally wondering about:
- Does this concept have a name?
- Is it possible to use unsafePerformIO to do this without my program exploding?
- Are there any library implementations of this already?
- From a theoretical perspective, would you call this "pure" or "unpure"?
- Is there anything else interesting to say about this?
[link] [15 comments]
At FP Complete, we’re constantly striving to improve the quality of the Haskell ecosystem, with a strong emphasis on making Haskell a viable tool for commercial users. Over the past few years we’ve spoken with many companies either currently using Haskell or considering doing so, worked with a number of customers in making Haskell a reality for their software projects, and released tooling and libraries to the community.
We’re also aware that we’re not the only company trying to make Haskell a success, and that others are working on similar projects to our own. We believe that there’s quite a lot of room to collaborate on identifying problems, discussing options, and creating solutions.
If you're interested in using Haskell in a commercial context, please join the mailing list. I know that we have some projects we think are worth immediate collaboration, and we'll kick off discussions on those after people have time to join the mailing list. And I'm sure many others have ideas too. I look forward to hearing them!