As readers of this blog or the mailing lists are likely already aware: package security is important to both FP Complete and other members of the Commercial Haskell community. While there was quite a bit of public discussion around this during the planning phase, I was reminded in a conversation on Friday that we never announced the outcome of these plans.
tl;dr: Secure package distribution is fully implemented in stack, with some options to harden the default. We're still implementing an easy author signing story, and that will be announced soon.
The implementation we have in stack follows the plan in the above-linked proposal pretty directly. Let me just flesh it out fully here:
- The all-cabal-hashes repository is used by default by stack for getting the collection of cabal files (known as the package index). This is downloaded over https. In addition to the raw .cabal files, this repository also contains hashes and download sizes for all tarballs available.
- When downloading tarballs, the file size and content hash will be verified against the information provided in the index, if available. If more bytes are provided than indicated, the download is aborted. Only after verification is complete is the file moved into its final destination and available for future operations.
- For added security (which I'd recommend), you can also turn on GPG verification and requiring hashes for this index (see the stack.yaml configuration settings).
- GPG verification will use Git's built-in GPG support to verify the signature on the all-cabal-hashes tag before accepting the new content, and will refuse to update the index if the GPG verification fails. (You'll need to add our GPG key to your keychain.)
- Requiring hashes means that the package index will not be accepted unless every package listed also has package hash/download size information. This is disabled by default for those who download the package index without Git support.
The story still isn't complete: we have no way to verify that the package author really is the person who uploaded the package. Stay tuned to the upload/signature author work we're doing, which will hopefully be available Real Soon Now(tm).
I have a few newtypes representing C pointers, and corresponding wrapped C functions (using inline-c) that return in IO. Some of these functions mutate their operands in-place, without a warning.
E.g. in test below, kspSolve x v first uses v and then updates it with the solution (the ksp object is changed as well, by those setup calls and by the solve itself).
How can I improve this, in order to achieve purely-functional signatures?test :: KspType_ -> IO () test km = withMatSeqAIJPipeline cs m m nz nnz (`matSetDiagonal` 200.0) $ \mat -> withVecMPIPipeline cs m (`vecSet` 1.0) $ \v -> withVecMPIPipeline cs m (`vecSet` 1.0) $ \x -> withKsp cs $ \ksp -> do kspSetOperators ksp mat mat kspSetType ksp km kspSetInitialGuessNonzero ksp True kspSetUp ksp kspSolve ksp x v soln <- kspGetSolution ksp vecViewStdout soln where cs = commSelf m = 5 nz = sum nnz nnz = replicate (fromIntegral m) 1
Also, the bracket construct from Control.Exception is great for hiding the memory management details and to build those pretty with- functions.
All feedback welcome, thanks in advance!submitted by ocramz
[link] [10 comments]
Ahrefs Research is a San Francisco branch of Ahrefs Pte Ltd (Singapore), which runs an internet-scale bot that crawls whole Web 24/7, storing huge volumes of information to be indexed and structured in timely fashion. On top of that Ahrefs is building analytical services for end-users.
Ahrefs Research develops a custom petabyte-scale distributed storage to accommodate all that data coming in at high speed, focusing on performance, robustness and ease of use. Performance-critical low-level part is implemented in C++ on top of a distributed filesystem, while all the coordination logic and communication layer, along with API library exposed to the developer is in OCaml.
We are a small team and strongly believe in better technology leading to better solutions for real-world problems. We worship functional languages and static typing, extensively employ code generation and meta-programming, value code clarity and predictability, constantly seek out to automate repetitive tasks and eliminate boilerplate, guided by DRY and following KISS. If there is any new technology that will make our life easier - no doubt, we'll give it a try. We rely heavily on opensource code (as the only viable way to build maintainable system) and contribute back, see e.g. https://github.com/ahrefs . It goes without saying that our team is all passionate and experienced OCaml programmers, ready to lend a hand or explain that intricate ocamlbuild rule.
Our motto is "first do it, then do it right, then do it better".What we need
Ahrefs Research is looking for backend developer with deep understanding of operating systems, networks and taste for simple and efficient architectural designs. Our backend is implemented mostly in OCaml and some C++, as such proficiency in OCaml is very much appreciated, otherwise a strong inclination to intensively learn OCaml in a short term will be required. Understanding of functional programming in general and/or experience with other FP languages (F#,Haskell,Scala,Scheme,etc) will help a lot. Knowledge of C++ is a plus.
The candidate will have to deal with the following technologies on the daily basis:
- networks & distributed systems
- 4+ petabyte of live data
The ideal candidate is expected to:
- Independently deal with and investigate bugs, schedule tasks and dig code
- Make argumented technical choice and take responsibility for it
- Understand the whole technology stack at all levels : from network and userspace code to OS internals and hardware
- Handle full development cycle of a single component, i.e. formalize task, write code and tests, setup and support production (devops)
- Approach problems with practical mindset and suppress perfectionism when time is a priority
These requirements stem naturally from our approach to development with fast feedback cycle, highly-focused personal areas of responsibility and strong tendency to vertical component splitting.What you get
- Competitive salary
- Modern office in San Francisco SOMA (Embarcadero)
- Informal and thriving atmosphere
- First-class workplace equipment (hardware, tools)
- No dress code
Get information on how to apply for this position.
At Medium, Fredrik (@ique) describes using Haskell in anger.
At the start of the products’ life we mostly analyzed small retirement plans with 100 to 500 plan participants. As time went on we started seeing plans with 2,000 participants and even 10,000 participants. At these numbers the execution time for the processing started going outside of acceptable bounds. Luckily, every participant could be analyzed in isolation from the others and so the problem was embarrassingly parallel.I changed one line of code frommap outputParticipant partstomap outputParticipant parts `using` parListChunk 10 rdeepseqand execution times were now about 3.7x faster on our 4-core server. That was enough to entirely fix the issue for another 6 months, during which time we could focus on further enhancing the product instead of worrying about performance. Later, we did do extensive profiling and optimization of the code to further reduce execution times significantly, but we didn’t want to prematurely optimize anything.Spotted via Manual Chakravarty @TacticalGrace.
The purely functional, high-performance array language Accelerate has gained an LLVM-based backend targeting both multicore CPUs and GPUs that uses Haskell’s advanced type system to statically assure a range of safety properties. In the final version of our paper at the forthcoming 2015 Haskell Symposium, we describe (1) how we ensure these safety properties using GADTs and type families, (2) we outline the architecture of the LLVM backend (including a richly typed interface to LLVM), and (3) we provide benchmarks supporting our claim that we are able to produce competitive code.
Shoutout to the awesome Trevor McDonell who did all the hard work.
A benefit of pure functional programming is that it encourages equational reasoning. However, the Haskell language has lacked direct tool support for such reasoning. Consequently, reasoning about Haskell programs is either performed manually, or in another language that does provide tool support (e.g. Agda or Coq). HERMIT is a Haskell-specific toolkit designed to support equational reasoning and user-guided program transformation, and to do so as part of the GHC compilation pipeline. This paper describes HERMIT’s recently developed support for equational reasoning, and presents two case studies of HERMIT usage: checking that type-class laws hold for specific instance declarations, and mechanising textbook equational reasoning.
The Remote Monad Design Pattern(by A. Gill, N. Sculthorpe, J. Dawson, A. Eskilson, A. Farmer, M. Grebe, J. Rosenbluth, R. Scott, and J. Stanton)
Remote Procedure Calls are expensive. This paper demonstrates how to reduce the cost of calling remote procedures from Haskell by using the remote monad design pattern, which amortizes the cost of remote calls. This gives the Haskell community access to remote capabilities that are not directly supported, at a surprisingly inexpensive cost.
Summary: I'm growing increasingly fond of the Conduit library. Here I give my intuitions and some hints I'd have found useful.
Recently I've been working on converting the Hoogle database generation to use the Conduit abstraction, in an effort to reduce the memory and improve the speed. It worked - database generation has gone from 2Gb of RAM to 320Mb, and time has dropped from several minutes (or > 15 mins on memory constrained machines) to 45s. These changes are all in the context of Hoogle 5, which should hopefully be out in a month or so.
The bit that I've converted to Conduit is something that takes in a tarball of one text files per Hackage file, namely the Haddock output with one definition per line (this 22Mb file). It processes each definition, saves it to a single binary file (with compression and some processing), and returns some compact information about the definition for later processing. I don't expect the process to run in constant space as it is accumulating some return information, but it is important that most of the memory used by one definition is released before the next definition. I originally tried lazy IO, and while it somewhat worked, it was hard to abstract properly and very prone to space leaks. Converting to Conduit was relatively easy and is simpler and more robust.The Conduit model
My mental model for a conduit Conduit a m b is roughly a function [a] -> m [b] - a values go in and b values come out (but interleaved with the monadic actions). More concretely you ask for an a with await and give back a b with yield, doing stuff in the middle in the m Monad.
A piece of conduit is always either running (doing it's actual work), waiting after a yield for the downstream person to ask for more results (with await), or waiting after an await for the upstream person to give the value (with yield). You can think of a conduit as making explicit the demand-order inherent in lazy evaluation.Things to know about Conduit
I think it's fair to say Conduit shows its history - this is good for people who have been using it for a while (your code doesn't keep breaking), but bad for people learning it (there are lots of things you don't care about). Here are some notes I made:
- The Data.Conduit module in the conduit library is not the right module to use - it seems generally accepted to use the Conduit module from the conduit-combinators package. However, I decided to build my own conduit-combinators style replacement in the Hoogle tree, see General.Conduit - the standard Conduit module has a lot of dependencies, and a lot of generalisations.
- Don't use Source or Sink - use Producer and Consumer - the former are just a convenient way to get confusing error messages.
- Don't use =$ or $=, always use =$= between conduits. The =$= operator is essentially flip (.).
- Given a Conduit you can run it with runConduit. Alternatively, given a =$= b =$= c you can replace any of the =$= with $$ to run the Conduit as well. I find that a bit ugly, and have stuck to runConduit.
- Conduit and ConduitM have their type arguments in different orders, which is just confusing. However, generally I use either Conduit (a connector) or Producer (something with a result). You rarely need something with a result and a return value.
- You call await to see if a value is available to process. The most common bug I've had with conduits is forgetting to make the function processing items recursive - usually you want awaitForever, not just await.
- The ByteString lines Conduit function was accidentally O(n^2) - I spotted and fixed that. Using difference lists does not necessarily make your code O(n)!
When using Conduit I found a number of functions seemingly missing, so defined them myself.
First up is countC which counts the number of items that are consumed. Just a simple definition on top of sumC.countC :: (Monad m, Num c) => Consumer a m c
countC = sumC <| mapC (const 1)
While I recommend awaitForever in preference to await, it's occasionally useful to have awaitJust as the single-step awaitForever, if you are doing your own recursion.awaitJust :: Monad m => (i -> Conduit i m o) -> Conduit i m o
awaitJust act = do
x <- await
whenJust x act
I regularly find zipFrom i = zip [i..] very useful in strict languages, and since Conduit can be viewed as a strict version of lazy lists (through very foggy glasses) it's no surprise a Conduit version is also useful.zipFromC :: (Monad m, Enum c) => c -> Conduit a m (c, a)
zipFromC !i = awaitJust $ \a -> do
zipFromC (succ i)
Finally, it's useful to zip two conduits. I was surprised how fiddly that was with the standard operators (you have to use newtype wrappers and an Applicative instance), but a simple |$| definition hides that complexity away.(|$|) :: Monad m => ConduitM i o m r1 -> ConduitM i o m r2 -> ConduitM i o m (r1,r2)
(|$|) a b = getZipConduit $ (,) <$> ZipConduit a <*> ZipConduit b
Why not Pipes?
I am curious if all Pipes users get asked "Why not use Conduit?", or if this FAQ is asymmetrical?
I realise pipes are billed as the more "principled" choice for this type of programming, but I've yet to see anywhere Conduit seems fundamentally unprincipled. I use WAI/Warp and http-conduit, so learning Conduit gives me some help there.
About 15 years ago, I was hanging out at the MIT AI Lab, and there was an ongoing seminar on the Coq proof assistant. The idea was that you wouldn't have to guess whether your programs were correct; you could prove that they worked correctly.
The were just two little problems:
- It looked ridiculously intimidating.
- Rumor said that it took a grad student all summer to implement and prove the greatest common divisor algorithm, which sounded rather impractical.
So I decided to stick to Lispy languages, which is what I was officially supposed to be hacking on, anyway, and I never did try to sit in on the seminar.Taking another look
I should have taken a look much sooner. This stuff provides even more twisted fun than Haskell! Also, projects like the CompCert C compiler are impressive: Imagine a C compiler where every optimization has been proven correct.
Even better, we can write code in Coq, prove it correct, then export it to Haskell or several other functional languages.
Here's an example Coq proof. Let's start with a basic theorem that says "If we know A is true, and we know B is true, then we know A /\ B (both A and B) is true."Theorem basic_conj : forall (A B : Prop), A -> B -> A /\ B. Proof. (* Give names to our inputs. *) intros A B H_A_True H_B_True. (* Specify that we want to prove each half of /\ separately. *) split. - apply H_A_True. (* Prove the left half. *) - apply H_B_True. (* Prove the right half. *) Qed.
At this point, the right-hand pane will show the theorem that we're trying to prove:1 subgoals, subgoal 1 (ID 1) ============================ forall A B : Prop, A -> B -> A /\ B
I would like to modify my library -- a set of bindings to a REST library -- to use Vinyl, to get over the "Haskell records' problem." That is, the basic problem I'm trying to get around is that of overloaded record fields (having a bunch of JSON record types with an id field, for instance).
Could someone please provide a minimal-as-heck example of how to do this using Vinyl?
The Vinyl tutorial over at https://github.com/VinylRecords/Vinyl/blob/master/tests/Intro.lhs is simply not basic enough. Yes, I understand that row polymorphism is important (and I wouldn't mind if the minimal example I am asking for touches upon it slightly), but much of the setup there looks like fluff. For example, why is he defining the =:: operator myself? What's all this Singletons stuff?
How can I get started in Vinyl -- defining two records which share field names and provide lenses? That's literally all I need to use Vinyl for REST bindings with overloaded record fields.submitted by BoteboTsebo
[link] [22 comments]
I'm in the process of "porting" (i.e. Providing similar functionality with a similar interface, but hopefully more powerful and elegant) a library from another language to Haskell and I've hit a point that it's not clear to me what I should do.
The library I'm working from represents dates with an Int32 with the fields bit encoded. This makes sense on some level because I don't need a whole 32 bits (or even 8!) to store e.g. the month. But on the other hand, doing it this way means trading space for computation time (accessing a field is bit operations instead of a memory access) and it strikes me that, thanks to non-strictness, Haskell may already be effectively doing something like this anyway. After all, at some point I'll be pulling the numbers out of this bit encoding and I would have to store the answer in some wider field at that point.
So has anyone analyzed a situation like this before to see if there is a practical difference? The more I think about it, the more I think the "efficient" version is probably slower and less memory efficient.
If you think about it, the other langue is, in effect, rolling their own laziness to an extent: you can store a huge list of dates and they won't take up much space. One will only be "expanded" if you actually access it (of course they don't do thunks so you'll pay every time you expand a field). But as I understand the Haskell execution model, it's already doing this for me in a superior way: the list itself is a thunk until evaluated, then it becomes a list of thunks until the underlying structure is (e.g. a field of it) is accessed, at which point the field is stored in this too-wide field.
Perhaps it could make sense to provide a .Strict variant that does the bit encoding?submitted by nicheComicsProject
[link] [22 comments]
[ Notice: I originally published this report at the wrong URL. I moved it so that I could publish the June 2015 report at that URL instead. If you're seeing this for the second time, you might want to read the June article instead. ]
A lot of the stuff I've written in the past couple of years has been on Mathematics StackExchange. Some of it is pretty mundane, but some is interesting. I thought I might have a little meta-discussion in the blog and see how that goes. These are the noteworthy posts I made in April 2015.
Languages and their relation : help is pretty mundane, but interesting for one reason: OP was confused about a statement in a textbook, and provided a reference, which OPs don't always do. The text used the symbol . OP had interpreted it as meaning , but I think what was meant was .
I dug up a copy of the text and groveled over it looking for the explanation of , which is not standard. There was none that I could find. The book even had a section with a glossary of notation, which didn't mention . Math professors can be assholes sometimes.
Is there an operation that takes and , and returns is more interesting. First off, why is this even a reasonable question? Why should there be such an operation? But note that there is an operation that takes and and returns , namely, multiplication, so it's plausible that the operation that OP wants might also exist.
But it's easy to see that there is no operation that takes and and returns : just observe that although , the putative operation (call it ) should take and yield , but it should also take and yield . So the operation is not well-defined. And you can take this even further: can be written as , so should also take and yield .
They key point is that the representation of a number, or even an integer, in the form is not unique. (Jargon: "exponentiation is not injective".) You can raise , but having done so you cannot look at the result and know what and were, which is what needs to do.
But if can't do it, how can multiplication do it when it multiplies and and gets ? Does it somehow know what is? No, it turns out that it doesn't need in this case. There is something magical going on there, ultimately related to the fact that if some quantity is increasing by a factor of every units of time, then there is some for which it is exactly doubling every units of time. Because of this there is a marvelous group homomophism $$\log : \langle \Bbb R^+, \times\rangle \to \langle \Bbb R ,+\rangle$$ which can change multiplication into addition without knowing what the base numbers are.
In that thread I had a brief argument with someone who thinks that operators apply to expressions rather than to numbers. Well, you can say this, but it makes the question trivial: you can certainly have an "operator" that takes expressions and and yields the expression . You just can't expect to apply it to numbers, such as and , because those numbers are not expressions in the form . I remembered the argument going on longer than it did; I originally ended this paragraph with a lament that I wasted more than two comments on this guy, but looking at the record, it seems that I didn't. Good work, Mr. Dominus.
how 1/0.5 is equal to 2? wants a simple explanation. Very likely OP is a primary school student. The question reminds me of a similar question, asking why the long division algorithm is the way it is. Each of these is a failure of education to explain what division is actually doing. The long division answer is that long division is an optimization for repeated subtraction; to divide you want to know how many shares of three cookies each you can get from cookies. Long division is simply a notation for keeping track of removing shares, leaving cookies, then further shares, leaving none.
In this question there was a similar answer. is because if you have one cookie, and want to give each kid a share of cookies, you can get out two shares. Simple enough.
There is a general pedagogical principle that an ounce of examples are worth a pound of theory. My answer here is a good example of that. When you explain the theory, you're telling the student how to understand it. When you give an example, though, if it's the right example, the student can't help but understand it, and when they do they'll understand it in their own way, which is better than if you told them how.
How to read a cycle graph? is interesting because hapless OP is asking for an explanation of a particularly strange diagram from Wikipedia. I'm familiar with the eccentric Wikipedian who drew this, and I was glad that I was around to say "The other stuff in this diagram is nonstandard stuff that the somewhat eccentric author made up. Don't worry if it's not clear; this author is notorious for that."
In Expected number of die tosses to get something less than 5, OP calculated as follows: The first die roll is a winner of the time. The second roll is the first winner of the time. The third roll is the first winner of the time. Summing the series we eventually obtain the answer, . The accepted answer does it this way also.
But there's a much easier way to solve this problem. What we really want to know is: how many rolls before we expect to have seen one good one? And the answer is: the expected number of winners per die roll is , expectations are additive, so the expected number of winners per die rolls is , and so we need rolls to expect one winner. Problem solved!
I first discovered this when I was around fifteen, and wrote about it here a few years ago.
As I've mentioned before, this is one of the best things about mathematics: not that it works, but that you can do it by whatever method that occurs to you and you get the same answer. This is where mathematics pedagogy goes wrong most often: it proscribes that you must get the answer by method X, rather than that you must get the answer by hook or by crook. If the student uses method Y, and it works (and if it is correct) that should be worth full credit.
Bad instructors always say "Well, we need to test to see if the student knows method X." No, we should be testing to see if the student can solve problem P. If we are testing for method X, that is a failure of the test or of the curriculum. Because if method X is useful, it is useful because for some problems, it is the only method that works. It is the instructor's job to find one of these problems and put it on the test. If there is no such problem, then X is useless and it is the instructor's job to omit it from the curriculum. If Y always works, but X is faster, it is the instructor's job to explain this, and then to assign a problem for the test where Y would take more time than is available.
I see now I wrote the same thing in 2006. It bears repeating. I also said it again a couple of years ago on math.se itself in reply to a similar comment by Brian Scott:
If the goal is to teach students how to write proofs by induction, the instructor should damned well come up with problems for which induction is the best approach. And if even then a student comes up with a different approach, the instructor should be pleased. ... The directions should not begin [with "prove by induction"]. I consider it a failure on the part of the instructor if he or she has to specify a technique in order to give students practice in applying it.