Somatix Systems (www.somatix.ai) is re-imagining the way customers engage with the brands they love. We've got the DNA of a stealth startup and the deep resources of an established company. Our flagship product creates new interactions between customers and brands, and we're already reaching millions of customers with one of the world's most recognizable companies. We're building a team to sling the latest advancements in IoT, cloud infrastructure, mobility, big data analytics, and machine learning / AI to make this magic happen.
We work in small, self-sufficient, teams with the shared goal of delivering excellent software, anchored in an agile culture of quality, delivery, and innovation. We encourage our engineers to share responsibility for writing tested, high performance, maintainable code. We deliver working software in short sprints via iterative and incremental development practices. Individuals on our self-organizing teams are motivated, trusted and continuously focused on excellence. We share a common love for functional programming and using data to solve complex problems and create value.About You
- You've developed an appreciation for functional programming and revel at the opportunity to build production systems in Haskell
- You welcome the responsibility and thrill that comes with being a member of a founding team
- You court perfection, but you're grounded and practical
- You understand that legacy code is the work you did yesterday
- Significant experience using Haskell, Scala, Erlang, Clojure (or other functional languages) to build production systems, complex personal projects, or make major OSS contributions
- Real-world experience designing, developing, testing, and deploying systems based on SOA or micro-services
- Skilled at designing and implementing SQL and NOSQL data persistence stores and caches
- Experience building REST API services using functional languages and design principles
- Ability to build infrastructure for real-time analytics and real-time predictive intelligence based on large, diverse, and dynamic data sets
- Hardware hacking and pro-typing experience is welcome (e.g. Arduino, Raspberry Pi) !
- Let us know why you think that you're the kind of person we're looking for
- We value demonstrated expertise over a resume so show us what you've built
Get information on how to apply for this position.
A special-interest group meeting during the ACM CHI 2016 conference in San Jose, CA on the topic of the usability of programming languages. People are invited to attend!
To attend you must be registered for the CHI'2016 conference, and early registration ends March 14:
For more information about the SIG, see:
Programming languages form the interface between programmers (the users) and the computation that they desire the computer to
execute. Although studies exist for some aspects of programming language design (such as conditionals), other aspects have received little or no human factors evaluations. Designers thus have little they can rely on if they want to make new languages highly usable, and users cannot easily chose a language based on usability criteria. This SIG will bring together researchers and practitioners interested in increasing the depth and breadth of studies on the usability of programming languages, and ultimately in improving the usability of future languages.
- Introduction to High Performance Spark
- How Spark Works
- DataFrames, Datasets & Spark SQL
- Joins (SQL & Core)
You can buy it now from O'Reilly :)
Our planned future chapters are*:
- Effective Transformations
- Working with Key/Value data
- Data Structure Design
- Spark Tuning and Cluster Sizing
- Getting Outside of the JVM
- Debugging Techniques
- Spark Components
We'd love your feedback to firstname.lastname@example.org so we can make a super awesome finished book for you all. If you are going to Strata San Jose next week, I'll also be giving a talk on testing Spark & hosting office hours I'd love to see some of you there.
If your up in Seattle area Rachel and I are coming up for Data Day Seattle - hope to see some of you there too!
For our friends across the pond I'll also be speaking at Strata London and hopefully we will have an update with a few more chapters by then (but we might also need to take a quick break from writing to do our day jobs. Don't tell our editors :p).
*Subject to change depending on feedback and how much fun we have writing each chapter
tl;dr: I know how to generate random instances of data types in a generic way, and even have some old code that already does all the hard work, but won’t have time to polish and package it until this summer. If you’re interested in helping, let me know!
This morning Kenny Foner pointed out to me this tweet by Gabriel Gonzales, asking why there isn’t a default Arbitrary instance for types implementing Generic. It reminded me that I’ve been meaning for a while now (years, in fact!) to get around to packaging up some code that does this.
As several pointed out on Twitter, this seems obvious, but it isn’t. It’s easy to write a generic Arbitrary instance, but hard to write one that generates a good distribution of values. The basic idea is clear: randomly pick a constructor, and then recursively generate random subtrees. The problem is that this is very likely to either blow up and generate gigantic (even infinite) trees, or to generate almost all tiny trees, or both. I wrote a post about this three years ago which illustrates the problem. It also explains half of the solution: generate random trees with a target size in mind, and throw out any which are not within some epsilon of the target size (crucially, stopping the generation early as soon as the tree being generated gets too big).
However, I never got around to explaining the other half of the solution: it’s crucially important to use the right probabilities when picking a constructor. With the wrong probabilities, you will spend too much time generating trees that are either too small or too big. The surprising thing is that with exactly the right probabilities, you can expect to wait only time before generating a tree of size (approximately1) .2
So, how does one pick the right probabilities? Essentially, you turn the generic description of your data type into a mutually recursive system of generating functions, and (numerically) find their radii of convergence, when thought of as functions in the complex plane. Using these values it is straightforward to compute the right probabilities to use. For the intrepid, this is explained in Duchon et. al3.
I have some old Haskell code from Alexis Darrasse which already does a bunch of the work. It would have to be updated a bit to work with modern libraries and with GHC.Generics, and packaged up to go on Hackage. I won’t really have time to work on this until the summer—but if anyone else is interested in working on this, let me know! I’d be happy to send you the code and provide some guidance in figuring it out.
The constant factor depends on how approximate you are willing to be.↩
I wanted to put an exclamation point at the end of that sentence, because this is really surprising. But it looked like factorial. So, here is the exclamation point: !↩
Duchon, Philippe, et al. “Boltzmann samplers for the random generation of combinatorial structures.” Combinatorics Probability and Computing 13.4-5 (2004): 577-625.↩
The Strats team at Standard Chartered has another three open positions for typed functional programming developers, based in London or Singapore. Strats are a specialized software engineering and quantitative analysis team who build a broad range of software for financial markets users at Standard Chartered.
You will work on the trading floor, directly with traders, sales and risk managers, building software to automate their work and improve their efficiency. The focus of this role will be to build new tools and prototypes for the market risk team.
You will use Haskell for almost all tasks: data analysis, market data publishing, database access, web services, desktop GUIs, large parallel tasks, quantitative models, solvers, everything. This is a fast paced role – code you write today will be deployed within hours to hundreds of users and has to work.
These are permanent, associate director and director positions, in London and Singapore as part of the Strats global team. Demonstrated experience in typed FP (Haskell, OCaml, F# etc) is required. We have around 3 million lines of Haskell, and our own Haskell compiler. In this context we look for skill and taste in typed functional programming to capture and abstract over complex, messy systems. You would join a growing team of around 15 experienced Haskell developers.
Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we definitely want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability. A PhD in computer science is a strong advantage.
The role requires physical presence on the trading floor in Singapore or London. Remote work is not an option. You will have some project and client management skills — you will talk to users, understand their problems and then implement and deliver what they really need. No financial background is required. These positions have attractive remuneration for the right candidates. Relocation support will also be provided. Contracting-based positions are also possible if desired.
Applicants who don’t necessarily meet all criteria but have an interest in working in Singapore in particular, and have an FP background, are encouraged to apply.
If this sounds exciting to you, please send your resume to me – donald.stewart <at> sc.com
This blog post came out of some discussions I had with my coworkers at FP Complete. I realized I have a number of reasons I prefer working with a typeclass based libraries (such as classy-prelude) that I've never put down in writing anywhere. I decided to write up this blog post to collect all of my thoughts in one place.
This same FP Complete discussion is also what fueled my Tweet on the subject:
Would you like a Haskell typeclass that provided a Map-like interface compatible with Map, HashMap, and IntMap?— Michael Snoyman (@snoyberg) March 15, 2016 <script async="async" charset="utf-8" src="http://platform.twitter.com/widgets.js"></script>Common practice
This is hardly a strong argument, but I thought I'd get this out of the way first. It's common practice in many other languages to program to abstract interfaces instead of concrete types. Java is likely the best example of this, though Python's duck-typing, Go's interfaces, and C++ templates are good examples too. In most language ecosystems, a culture of programming against something abstract took over a while ago.
Based on the fact that I'm a Haskell developer, I obviously don't put too much faith in the best practices of Java, Python, Go, and C++. Nonetheless, the fact that this pattern is repeated in many places, it's worth taking the experiences of others into account when making choices for ourselves.
Also, choosing a programming paradigm already familiar to people from other languages can help with language adoption. This isn't just idle speculation: I have heard new Haskellers express confusion about the lack of abstract interfaces for some common datatypes.Uniform interfaces
One of the nicest things about writing code against well-designed libraries is that the ideas, intuitions, and names translate between them. For example, whether I'm working with a list, Vector, Seq, or ByteString, I know that I can use foldr over the contents of the container. I don't have to wonder if the author of the library decided to call the function reduce instead, or changed the order of the arguments to the foldr function. My gut instincts about this function carries through.
Unfortunately, that statement does not work universally. I chose foldr as an example of a good citizen, but there are many other functions which are not so uniform:
- The mapM_ function is not provided by the Data.ByteString module. (I actually opened a PR about it.) Sure, I can get the same functionality via foldr, but it's extra cognitive overhead for the author versus a simple mapM_ call, and extra overhead for the reader to understand the goal of the code.
- The Data.HashMap.Strict API is missing a number of functions that are available in Data.Map and Data.IntMap, and which cannot be trivially reimplemented.
By contrast, when we use a typeclass-based interface (like Foldable or Traversable), we get the following benefits:
- All the functions I know are available to me. I can use Data.Foldable.mapM_ for any instance of Foldable, even if the author of that data type never considered mapM_.
- Add-on packages can provide new combinators without ever knowing about the data types I care about. The best example of this is Monad: there are dozens (if not hundreds or thousands) of useful Monad-based combinators sprinkled across Hackage, and these will work for arbitrary monadic datatypes I write in my own application.
We can't accidentally end up with conflicting type signatures or semantics for a function name. Some real-world examples of this are:
- Data.Map.toList behaves very differently from Data.Foldable.toList
- Data.ByteString.split versus Data.Text.split (text's split is closer to Data.ByteString.splitWith)
By programming against interfaces, a developer learns an interface once, gets access to a large array of convenience functions, and implementors can get away with providing less functionality themselves.No qualified imports
Many people claim that programming to typeclasses is really just about laziness: the right way to do this is to import everything qualified. The previous section shows it's about more than just laziness, but let me address qualified imports head-on:
- "Just laziness" is still a valid argument: why do extra work if it's not necessary?
When I'm not using classy-prelude, I will often times in the middle of coding realize I need access to, for example, Data.Map. I now need to break my train-of-thought from my current code to:
- Check if Data.Map is currently imported
- If not, add it to the imports
- Check if containers is in the cabal file
- If not, add it to the build-depends in the cabal file
- The above steps are even more confusing for new users
- On multi-person projects, inconsistent qualified import names are a real problem.
To expand on the last point: I've worked with many different people, and seen plenty of disagreements on import naming. Consider all of the following:import qualified Data.Map as Map import qualified Data.Map as M import qualified Control.Concurrent.MVar as M import qualified Control.Concurrent.MVar as MVar import qualified Data.ByteString as B import qualified Data.ByteString as S import qualified Data.ByteString as BS import qualified Data.ByteString.Char8 as B import qualified Data.ByteString.Char8 as B8 import qualified Data.ByteString.Lazy as BS import qualified Data.ByteString.Lazy as L
This is a small subset of the confusion I've seen. We have the same modules with different qualified names, and the same qualified names being shared by multiple modules. The upshot of this is that, each time I'm working on a different module, I need to remember which qualified import name is being used here. Standardizing this would be a great solution, but the problem is that we already tried it and it failed: most of the modules I listed above give a recommended short name associated with them, but people (myself included) do not follow it.Gives us more information
Speaking in terms of type classes will often times tell us more than using a concrete type. My favorite example of this is IO. Consider:foo :: IO a -> IO a
We know virtually nothing about foo. However, if instead we had:foo :: Monad m => m a -> m a
We now know a lot more. There are no monadic side-effects being added by foo itself (though the provided action may perform some). We still don't know everything: how many times is the supplied action being called, for example? But it's still a great start.
What I find really interesting is, despite what you may initially think, the following is also more informative than a plain IO:foo :: MonadIO m => m a -> m a
In this case, foo can once again perform unbounded IO actions itself, e.g.:foo action = do liftIO fireTheMissiles res <- action liftIO fireTheMissilesAgainForGoodMeasure return res
However, we know that some control flows (such as exception handling) are not being used, since they are not compatible with MonadIO. (Reason: MonadIO requires that the IO be in positive, not negative, position.) This lets us know, for example, that foo is safe to use in a continuation-based monad like ContT or Conduit.Continue using the same abstractions
If you're familiar with the list API, you can pick up the majority of classy-prelude trivially. Functions like length, break, takeWhile, and sum are all present, work just like their list-specific cousins, and are generalized to many additional types. Many of those types provide some nice performance improvements, making it easy to speed up your code.
Another library which provides this kind of common interface to many datatypes is lens. In many cases, this is done via a special typeclass (such as At, Cons, or Each). However, these all require learning a different abstraction from what you typically start Haskell with. On the other hand, the lens approach allows for new ways of composing code that the more standard functions make more difficult, so it certainly has advantages. I'm merely pointing out here the much lower learning curve of picking up typeclasses based on the functions you already know.Performance
Since in many cases, we can implement our typeclass-based functions in terms of underlying efficient functions for a specific data (together with INLINE and REWRITE rules), we will usually pay no overhead for using these abstractions.Easily test out alternative implementations
I've mentioned this in passing, but I'll call it out explicitly: programming to a typeclass lets you easily switch between different concrete implementations, which can be great for performance comparisons. Curious if a Map or a HashMap is faster? In the qualified import world, you'll need to:
- Change your import
- Modify the datatype name in all function signatures
- Rewrite any code using a Data.Map function that's not present in Data.HashMap.Strict
In a typeclass-based approach, you can write your functions the first time in terms of the typeclass, and then likely get away with changing just one explicit Map signature to HashMap.Common arguments against
I've given you a pretty opinionated list of reasons why I like typeclass-based programming. That's not to say that strong arguments against this approach don't exist. I'm going to give a (non-exhaustive) list of some such arguments and address them.
Error messages are more confusing. This is definitely the case; an error of Map is not a Vector is more clear than Map is not an instance of IsSequence. My response to this is that, fairly quickly, you get used to the error messages GHC generates and can understand them easily. Not quite as easily as monomorphic error messages, but easily enough.
I've heard people claim things like "I don't know what the code is doing." The argument goes like this: if you see a lookup, you don't know if it's a lookup on a Map or a Hashmap. I consider this the weakest argument against typeclass programming, because it means you aren't following the paradigm at all. If you're coding to typeclasses, you don't care which underlying implementation you're using: swapping out a Map for a HashMap would be just fine, and you just care about the stated semantics of the lookup function itself.
There are times where you really do care about something that goes beyond the semantics of the typeclass. For example, if you need the contents of a set-like structure returned as a list in ascending order, toList on a HashSet will fail you. But in such a case: you're really not looking for the general toList function, you're looking for a specific function which guarantees ordering.
An annoyance with typeclass coding is that, sometimes, you need to write extra type signatures, since the compiler can't guess exactly which implementation you're trying to use. This is certainly true, and can be annoying, but it also goes hand-in-hand with being able to easily test out alternative data types.
Some people only want to use typeclasses based on well-established mathematical foundations. While I agree that having a mathematical basis to a typeclass is a great way to ensure you have a good abstraction, I disagree with it being a prerequisite for such a typeclass to be useful. And as my evidence for this, I call the venerable Foldable typeclass, which has no laws associated with it but is eminently useful. (Similar examples: Binary, Storable, ToJSON, and FromJSON.)
- That said, I will readily admit that some of my original work in classy-prelude was far too loose in its usage of typeclasses for getting simple name overloading. The first few releases of the library wanted to test what was possible, but since then the typeclasses are restricted to what I would consider to be very sensible abstractions. You may disagree, and I'd be interested in hearing concrete examples of typeclasses you think are bad abstractions. But I think it's possible to look at code written exclusively against typeclasses and understand exactly what the code does.
There's a greater learning curve with an abstract interface versus a monomorphic interface. This is true, but once you need to learn two monomorphic interfaces, that learning curve is quickly amortized.
Those of you patient enough to sit through to the end of this long monologue can get a sneak preview. I've put my Map abstraction classes on Github inside the Jump project. The Jump project also happens to be the topic of conversation I had with the FP Complete team that I mentioned in the first paragraph above. Expect more information on this in the coming months.
Technical jargon is its own thing, intended for easy communication between trained practitioners of some art, but not necessarily between anyone else.
Jargon can be somewhat transparent, like the chemical jargon term “alcohol”. “Alcohol” refers to a large class of related chemical compounds, of which the simplest examples are methyl alcohol (traditionally called “wood alcohol”) and ethyl alcohol (the kind that you get in your martini). The extension of “alcohol” to the larger class is suggestive and helpful. Someone who doesn't understand the chemical jargon usage of “alcohol” can pick it up by analogy, and even if they don't they will probably have something like the right idea. A similar example is “aldehyde”. An outsider who hears this for the first time might reasonably ask “does that have something to do with formaldehyde?” and the reasonable answer is “yes indeed, formaldehyde is the simplest example of an aldehyde compound.” Again the common term is adapted to refer to the members of a larger but related class.
An opposite sort of adaptation is found in the term “bug”. The common term is extremely broad, encompassing all sorts of terrestrial arthropods, including mosquitoes, ladybugs, flies, dragonflies, spiders, and even isopods (“pillbugs”) and centipedes and so forth. It should be clear that this category is too large and heterogeneous to be scientifically useful, and the technical use of “bug” is much more restricted. But it does include many creatures commonly referred to as bugs, such as bed bugs, waterbugs, various plant bugs, and many other flat-bodied crawling insects.
Mathematics jargon often wanders in different directions. Some mathematical terms are completely opaque. Nobody hearing the term “cobordism” or “simplicial complex” or “locally compact manifold” for the first time will think for an instant that they have any idea what it means, and this is perfect, because they will be perfectly correct. Other mathematical terms are paradoxically so transparent seeming that they reveal their opacity by being obviously too good to be true. If you hear a mathematician mention a “field” it will take no more than a moment to realize that it can have nothing to do with fields of grain or track-and-field sports. (A field is a collection of things that are number-like, in the sense of having addition, subtraction, multiplication, and division that behave pretty much the way one would expect those operations to behave.) And some mathematical jargon is fairly transparent. The non-mathematician's idea of “line”, “ball”, and “cube” is not in any way inconsistent with what the mathematician has in mind, although the full technical meaning of those terms is pregnant with ramifications and connotations that are invisible to non-mathematicians.
But mathematical jargon sometimes goes to some bad places. The term “group” is so generic that it could mean anything, and outsiders often imagine that it means something like what mathematicians call a “set”. (It actually means a family of objects that behave like the family of symmetries of some other object.)
This last is not too terrible, as jargon failures go. There is a worse kind of jargon failure I would like to contrast with “bug”. There the problem, if there is a problem, is that entomologists use the common term “bug” much more restrictively than one expects. An entomologist will well-actually you to explain that a millipede is not actually a bug, but we are used to technicians using technical terms in more restrictive ways than we expect. At least you can feel fairly confident that if you ask for examples of bugs (“true bugs”, in the jargon) that they will all be what you will consider bugs, and the entomologist will not proceed to rattle off a list that includes bats, lobsters, potatoes, or the Trans-Siberian Railroad. This is an acceptable state of affairs.
Unacceptable, however, is the botanical use of the term “berry”:
It is one thing to adopt a jargon term that is completely orthogonal to common usage, as with “fruit”, where the technical term simply has no relation at all to the common meaning. That is bad enough. But to adopt the term “berry” for a class of fruits that excludes nearly everything that is commonly called a ”berry” is an offense against common sense.
This has been on my mind a long time, but I am writing about it now because I think I have found, at last, an even more offensive example.
Stonehenge is so-called because it is a place of hanging stones: “henge” is cognate with “hang”.
In 1932 archaeologists adapted the name “Stonehenge” to create the word “henge” as a generic term for a family of ancient monuments that are similar to Stonehenge.
Therefore, if there were only one thing in the whole world that ought to be an example of a henge, it should be Stonehenge.
However, Stonehenge is not, itself, a henge.
Stonehenge is not a henge.
STONEHENGE IS NOT A HENGE.
Stonehenge is not a henge. … Technically, [henges] are earthwork enclosures in which a ditch was dug to make a bank, which was thrown up on the outside edge of the ditch.
— Michael Pitts, Hengeworld, pp. 26–28.
“Henge” may just be the most ineptly coined item of technical jargon in history.
When Katara was about four, she was very distressed by the idea that green bugs were going to invade our house, I think though the mail slot in the front door. The obvious thing to do here was to tell her that there are no green bugs coming through the mail slot and she should shut up and go to sleep, but it seems clear to me that this was never going to work.
(It surprises me how few adults understand that this won't work. When Marcie admits at a cocktail party that she is afraid that people are staring at her in disgust, wondering why her pores are so big, many adults—but by no means all—know that it will not help her to reply “nobody is looking at your pores, you lunatic,” however true that may be. But even most of these enlightened adults will not hesitate to say the equivalent thing to a four-year-old afraid of mysterious green bugs. Adults and children are not so different in their irrational fears; they are just afraid of different kinds of monsters.)
Anyway, I tried to think what to say instead, and I had a happy idea. I told Katara that we would cast a magic spell to keep out the bugs. Red, I observed, was the opposite of green, and the green bugs would be powerfully repelled if we placed a bright red object just inside the front door where they would be sure to see it. Unwilling to pass the red object, they would turn back and leave us alone. Katara found this theory convincing, and so we laid sheets of red construction paper in the entryway under the mail slot.
Every night before bed for several weeks we laid out the red paper, and took it up again in the morning. This was not very troublesome, and certainly it less troublesome than arguing about green bugs every night with a tired four-year-old. For the first few nights, she was still a little worried about the bugs, but I confidently reminded her that the red paper would prevent them from coming in, and she was satisfied. The first few nights we may also have put red paper inside the door of her bedroom, just to be sure. Some nights she would forget and I would remind her that we had to put out the red paper before bedtime; then she would know that I took the threat seriously. Other nights I would forget and I would thank her for reminding me. After a few months of this we both started to forget, and the phase passed. I suppose the green bugs gave up eventually and moved on to a less well-defended house.
Several years later, Katara's younger sister Toph had a similar concern: she was afraid the house would be attacked by zombies. This time I already knew what to do. We discussed zombies, and how zombies are created by voodoo magic; therefore they are susceptible to voodoo, and I told Toph we would use voodoo to repel the zombies. I had her draw a picture of the zombies attacking the house, as detailed and elaborate as possible. Then we took black paper and cut it into black bars, and pasted the bars over Toph's drawing, so that the zombies were in a cage. The cage on the picture would immobilize the real zombies, I explained, just as one can stick pins into a voodoo doll of one's enemy to harm the real enemy. We hung the picture in the entryway, and Toph proudly remarked on how we had stopped the zombies whenever we went in or out.
Rationality has its limits. It avails nothing against green bugs or murderous zombies. Magical enemies must be fought with magic.