Set-Theoretic Types for Polymorphic Variants by Giuseppe Castagna, Tommaso Petrucciani, and Kim Nguyễn:
Polymorphic variants are a useful feature of the OCaml language whose current definition and implementation rely on kinding constraints to simulate a subtyping relation via unification. This yields an awkward formalization and results in a type system whose behaviour is in some cases unintuitive and/or unduly restrictive.
In this work, we present an alternative formalization of polymorphic variants, based on set-theoretic types and subtyping, that yields a cleaner and more streamlined system. Our formalization is more expressive than the current one (it types more programs while preserving type safety), it can internalize some meta-theoretic properties, and it removes some pathological cases of the current implementation resulting in a more intuitive and, thus, predictable type system. More generally, this work shows how to add full-fledged union types to functional languages of the ML family that usually rely on the Hindley-Milner type system. As an aside, our system also improves the theory of semantic subtyping, notably by proving completeness for the type reconstruction algorithm.
Looks like a nice result. They integrate union types and restricted intersection types for complete type inference, which prior work on CDuce could not do. The disadvantage is that it does not admit principal types, and so inference is non-deterministic (see section 5.3.2).
I have been pretty quiet on the blog in the past couple of months. One of the reasons for this is that I have spent most of my time learning Coq. I had my first contact with Coq well over a year ago when I started reading CPDT. Back then I only wanted to learn the basics of Coq to see how it works and what it has to offer compared to other languages with dependent types. This time I wanted to apply Coq to some ideas I had at work, so I was determined to be much more thorough in my learning. Coq is far from being a mainstream language but nevertheless it has some really good learning resources. Today I would like to present a brief overview of what I believe are the three most important books on Coq: “Interactive Theorem Proving and Program Development. Coq’Art: The Calculus of Inductive Constructions” (which I will briefly refer to as Coq’Art) by Yves Bertot and Pierre Castéran, “Certified Programming with Dependent Types” (CPDT) by Adam Chlipala and “Software Foundations” (SF for short) by Benjamin Pierce and over a dozen over contributors. All three books significantly differ in their scope and focus. CPDT and Coq’Art are standard, printed books. CPDT is also available online for free. Software Foundations is only available as an online book. Interestingly, there is also a version of SF that seems to be in the process of being revised.
I believe Coq’Art was the first book published on Coq. There are two editions – 2004 hardcover version and a 2010 paperback version – but as far as I know there are no differences between them. Too bad the 2010 edition was not updated for the newest versions of Coq – some of the code examples don’t work in the newest compiler. Coq’Art takes a theoretical approach, ie. it teaches Coq largely by explaining how the rules of Calculus of Constructions work. There are also practical elements like case studies and exercises but they do not dominate the book. Personally I found Coq’Art to be a very hard read. Not because it dives too much in theory – it doesn’t – but because the presentation seems to be chaotic. For example, description of a single tactic can be spread throughout deveral places in the book. In principle, I don’t object to extending earlier presentation with new details once the reader gets a hold of some new concepts, but I feel that Coq’Art definitely goes too far. Coq’Art also presents material in a very unusual order. Almost every introduction to Coq or any other functional language begins with defining data types. Coq’Art introduces them in chapter 6. On the other hand sorts and universes – something I would consider an advanced concept for anyone who is not familiar with type-level programming – are presented in the second chapter. (Note that first chapter is a very brief overview of the language.) By contrast, CPDT goes into detailed discussion of universes in chapter 12 and SF does not seem to cover them at all. Overall, Coq’Art is of limited usefulness to me. To tell the truth this is not because of its focus on theory rather than practice, but because of language style, which I find rather inaccessible. Many times I had problems understanding passages I was reading, forcing me to re-read them again and again, trying to figure out what is the message that the authors are trying to convey. I did not have such problems with CPDT, SF, nor any other book I have read in the past few years. At the moment I have given up on the idea of reading the book from cover to cover. Nevertheless I find Coq’Art a good supplementary reading for SF. Most importantly because of the sections that explain in detail the inner workings of various tactics.
As mentioned at the beginning, I already wrote a first impressions post about CPDT. Back then I said the book “is a great demonstration of what can be done in Coq but not a good explanation of how it can be done”. Having read all of it I sustain my claim. CPDT does not provide a thorough and systematic coverage of basics, but instead focuses on advanced topics. As such, it is not the best place to start for beginners but it is a priceless resource for Coq practitioners. The main focus of the book is proof automation with Ltac, Coq’s language for writing automated proof procedures. Reader is exposed to Ltac early on in the book, but detailed treatment of Ltac is delayed until chapter 14. Quite surprisingly, given that it is hard to understand earlier chapters without knowing Ltac. Luckily, the chapters are fairly independent of each other and can be read in any order the reader wishes. Definitely it is worth to dive into chapter 14 and fragments of apter 13 as early as possible – it makes understanding the book a whole lot easier. So far I have already read chapter 14 three times. As I learn Coq more and more I discover new bits of knowledge with each read. In fact, I expect to be going back regularly to CPDT.
Coq’Art and CPDT approach teaching Coq in totally different ways. It might then be surprising that Software Foundations uses yet another approach. Unlike Coq’Art it is focused on practice and unlike CPDT it places a very strong emphasis on learning the basics. I feel that SF makes Coq learning curve as flat as possible. The main focus of SF is applying Coq to formalizing programming languages semantics, especially their type systems. This should not come as a big surprise given that Benjamin Pierce, the author of SF, authored also “Types and Programming Languages” (TAPL), the best book on the topic of type systems and programming language semantics I have seen. It should not also be surprising that a huge chunk of material overlaps between TAPL and SF. I find this to be amongst the best things about SF. All the proofs that I read in TAPL make a lot more sense to me when I can convert them to a piece of code. This gives me a much deeper insight into the meaning of lemmas and theorems. Also, when I get stuck on an exercise I can take a look at TAPL to see what is the general idea behind the proof I am implementing.
SF is packed with material and thus it is a very long read. Three months after beginning the book and spending with it about two days a week I am halfway through. The main strength of SF is a plethora of exercises. (Coq’Art has some exercises, but not too many. CPDT has none). They can take a lot of time – and I really mean a lot – but I think this is the only way to learn a programming language. Besides, the exercises are very rewarding. One downside of the exercises is that the book provides no solutions, which is bad for self-studying. Moreover, the authors ask people not to publish the solutions on the internet, since “having solutions easily available makes [SF] much less useful for courses, which typically have graded homework assignments”. That being said, there are plenty of github repositories that contain the solved exercises (I also pledge guilty!). Although it goes against the authors’ will I consider it a really good thing for self-study: many times I have been stuck on exercises and was able to make progress only by peeking at someone else’s solution. This doesn’t mean I copied the solutions. I just used them to overcome difficulties and in some cases ended up with proofs more elegant than the ones I have found. As a side note I’ll add that I do not share the belief that publishing solutions on the web makes SF less useful for courses. Students who want to cheat will get the solutions from other students anyway. At least that has been my experience as an academic teacher.
To sum up, each of the books presents a different approach. Coq’Art focuses on learning Coq by understanding its theoretical foundations. SF focuses on learning Coq through practice. CPDT focuses on advanced techniques for proof automation. Personally, I feel I’ve learned the most from SF, with CPDT closely on the second place. YMMV
I recently came accross the streaming library. This library defines a type Stream (Of a) m b for computations that produce zero or more values of type a in a monad m, and eventually produce a value of type b. This stream type can be used for efficient IO without having to load whole files into memory. The streaming library touts bechmark results showing superior performance compared to other libraries like conduit, pipes and machines.
Looking at the datatype definition,data Stream f m r = Step !(f (Stream f m r)) | Effect (m (Stream f m r)) | Return r
it struck me how similar this type is to what is used in the stream fusion framework. The main difference being that the streaming library allows for interleaved monadic actions, and of course the lack of decoupling of the state from the stream to allow for fusion. But the vector library actually also uses such a monadic stream fusion framework, to allow for writing into buffers and such. This is type is defined in the module Data.Vector.Fusion.Stream.Monadic.data Stream m a = forall s. Stream (s -> m (Step s a)) s data Step s a where Yield :: a -> s -> Step s a Skip :: s -> Step s a Done :: Step s a
So, why not try to use vector's stream type directly as a representation of streams? I added this type as an extra alternative to the benchmark, and without writing any more code, the results are pretty impressive:
The only function that could be improved is scanL. In vector this function is implemented in terms of prescan (scanL without the first element) and cons, which makes it pretty inefficient. So I made a specialized implementation.
And that's all. A simple streaming 'library' with state of the art performance, while writing hardly any new code. Now to be fair, there are some reasons why you wouldn't always want to use these fusing streams. In particular, the resulting code could get quite large, and without fusion they may not be the most efficient.
I am building a web page devoted to the 2nd edition of Practical Foundations for Programming Languages, recently published by Cambridge University Press. Besides an errata, the web site features a commentary on the text explaining major design decisions and suggesting alternatives. I also plan to include additional exercises and to make sample solutions available to faculty teaching from the book.
The purpose of the commentary is to provide the “back story” for the development, which is often only hinted at, or is written between the lines, in PFPL itself. To emphasize enduring principles over passing fads, I have refrained from discussing particular languages in the book. But this makes it difficult for many readers to see the relevance. One purpose of the commentary is to clarify these connections by explaining why I said what I said.
As a starting point, I explain why I ignore the familiar concept of a “paradigm” in my account of languages. The idea seems to have been inspired by Kuhn’s (in)famous book The Structure of Scientific Revolutions, and was perhaps a useful device at one time. But by now the idea of a paradigm is just too vague to be useful, and there are many better ways to explain and systematize language structure. And so I have avoided it.
I plan for the commentary to be a living document that I will revise and expand as the need arises. I hope for it to provide some useful background for readers in general, and teachers in particular. I wish for the standard undergraduate PL course to evolve from a superficial taxonomy of the weird animals in the language zoo to a systematic study of the general theory of computation. Perhaps PFPL can contribute to effecting that change.
Filed under: Research, Teaching
A type signature to a function creates the requirement that an argument passed to the function have a specified type. However, operationally, the function only needs for the argument to have certain operations (other functions) defined on it.
Type classes in Haskell provide a means for a function to be polymorphic, accepting many different types, with the type class defining the set of fundamental functions (methods) available. However, even type classes may be too much: a function might require only a subset of the fundamental functions.
Duck typing is a much "looser" way of addressing this same problem. Previous thoughts on this problem.
Sometimes, even though all the operations a function needs might be defined on a type, we want to use the type signature of the function to restrict what it can be called on. The "explode" method might be defined for both firecrackers and nuclear weapons, but we might never want to accidentally pass a nuclear weapon to a certain function.
There is also a software engineering, and somewhat social, aspect to this problem. Even though today's implementation of a function might not require some method of a type class, tomorrow's improved implementation might. Declaring the type signature to be more than minimal is like making a reservation for future demands of functionality in the arguments. Making such a reservation forces all other software calling that function to provide that functionality, even though it isn't currently used. A type signature therefore represents a battle over a property line: who "owns" the right to decide what functionality an object should have?