News aggregator

FP Complete: Calculating the Minimum Variance Portfolio in R, Pandas and IAP

Planet Haskell - Fri, 04/04/2014 - 2:07pm
Introduction

As part of producing a demo for FP Complete's new IAP product, I wound up implementing the Minimum Variance Portfolio calculation for a stock portfolio in R, then in Haskell for the IAP, and finally in Python using the NumPy and SciPy extension libraries. I want to look at the process of writing each of these, and the resulting code in this article. I am not an expert in these things, so if you know a better way to do them, please let me know. In particular, the R example was borrowed from someone else.

The function

First, we find a version of the formula for MVP that can be converted into those systems. I like the one used by Yue Kuen KWOK:

ω = Ω⁻¹ 1/ 1 ⋅ Ω⁻¹ 1

where Ω is the covariant matrix of the returns for the stocks in question.

R version

The R version is fairly straightforward - we just need to put the pieces together:

minvar <- function (px){ Rt <- px[-1,]/px[-nrow(px),]-1 cov.Rt <- cov(Rt) one.vec <- rep(1,nrow(cov.Rt)) num <- solve(cov.Rt) %*% one.vec den <- as.numeric(t(one.vec) %*% num) return(num/den) }

Calculating returns can be done with standard matrix operations and slicing. The covariant function is built in, as is inverting it (solve). Since the numerator Ω⁻¹ 1 appears in the denominator, I reuse it's value there.

All the array operations were documented in the same place. That I only needed one unit vector was a bit of a surprise, but R sized it dynamically to work. That I had to transpose the unit vector and use the cross product operator (%*%) to get a dot product was a a less pleasant surprise, but is apparently a standard R idiom.

Python version

The Python version is almost a direct copy of the R version.

def minvar(prices): cov = np.cov((prices[1:] / prices[:-1] - 1).transpose()) vu = np.array(cov.shape[1] * [1], float) num = np.dot(np.linalg.inv(cov), vu) den = np.dot(vu, num) return num / den

In this case, I passed the returns matrix to the covariant function directly. The NumPy dot function performs both cross products and dot products, and again the unit vector adopts it's length dynamically.

Documentation was a bit more scattered. Being a mix of traditional imperative and object-oriented, some functions are functions in the module, and others are object methods. The biggest surprise was that the returns matrix needed to be transposed before the covariant was calculated.

Haskell version

Haskell is not quite as obvious a translation of the R and Python versions, but is a more straightforward translation of the original formula - once you notice that Ω⁻¹ 1 has been factored into tv. It reads from top to bottom like the original, with the main formula at the top and the various terms defined below.

Returns again use the standard Haskell idiom for slicing the array. This is a bit more verbose than either R or Python, as they are functions rather than special syntax.

minVariance prices = tv / scalar (tvu <.> tv) where rets = dropRows 1 prices / takeRows (rows prices - 1) prices - (_, cov) = meanCov $ rets vu = constant 1.0 (cols cov) tvu = constant 1.0 (rows cov) tv = inv cov <> vu

The documentation was again straightforward, with everything being a function in the hmatrix package. In this case, both unit vectors were needed, as Haskell does not scale them automatically. It was the least surprising in at least one way - it used a distinct dot product operator for the two vectors rather than transposing - whether implicitly like Python or explicitly like R - the unit vector in a cross product.

Performance

While performance comparisons with IAP aren't very useful, as it runs in the cloud so doing comparisons on identical systems may be impossible, Haskell does have one interesting advantage.

All three of these systems have the same basic architecture - a high-level language running in an interactive environment, with bindings to low-level, fast implementations of the matrix manipulations. Haskell adds the ability to compile your program into native x86_64 code. Doing so reduced the wall clock time of this short demo by roughly two orders of magnitude.

Summary

I found the IAP version a little easier to deal with. Having custom operators and functions for everything - and this will only get better with time - along with being able to use the mathematical layout of the where statement just made things a little easier to deal with. While not having unit vectors that automatically size themselves - or transpose into matrices - is a little inconvenient, this also exposed a problem in the original R version, in that the unit vector's length was initially set wrong. I'm not sure that will make any real difference, but the thought that the language can catch such errors for me is comforting.

Categories: Offsite Blogs

Nested nested lists [noob question]

Haskell on Reddit - Fri, 04/04/2014 - 11:36am

I'm just starting to learn haskell. I'm a complete noob. I wanted to construct a function that would take something and wrap it in a list.

matryoshka :: a -> [a] matryoshka x = [x] > matryoshka 3 [3] > matryoshka [3] [[3]] > (matryoshka . matryoshka) 3 [[3]]

All good. Now, I want a function that does this to arbitrary depth.

doll n x = iterate matryoshka n

This does not work, because matryoshka is of type (a -> [a]), and iterate wants (a -> a).

My question is, is there a way to write this sort of function with the standard Haskell list? I know that it should be possible to make my own type and do it that way, but it would be neater to do it using the built-in types.

submitted by the_noodle
[link] [12 comments]
Categories: Incoming News

Philip Wadler: FAQ: The snake fight portion of your thesis defense

Planet Haskell - Fri, 04/04/2014 - 11:14am
FAQ: The snake fight portion of your thesis defense.

Q: Do I have to kill the snake?
A: University guidelines state that you have to “defeat” the snake. There are many ways to accomplish this. Lots of students choose to wrestle the snake. Some construct decoys and elaborate traps to confuse and then ensnare the snake. One student brought a flute and played a song to lull the snake to sleep. Then he threw the snake out a window.
Q: Does everyone fight the same snake?
A: No. You will fight one of the many snakes that are kept on campus by the facilities department.
Spotted by Garrett Morris.
Categories: Offsite Blogs

Snap for Beginners

del.icio.us/haskell - Fri, 04/04/2014 - 10:00am
Categories: Offsite Blogs

Is there a function that converts Bool to Maybe?

Haskell on Reddit - Fri, 04/04/2014 - 6:37am

Hi there

Recently I had a Bool that told me whether to construct a value. That makes my value optional, so a Maybe. After doing this a few times, I extracted the following function out of it. I was surprised I couldn't find something similar on Hoogle. Any suggestions?

whenMaybe :: (Applicative m) => m a -> Bool -> m (Maybe a) whenMaybe _ False = pure Nothing whenMaybe ma True = Just <$> ma

(My specific application is parsing MQTT messages. A flag earlier in the message controls whether to expect another value later in the message. So my m is Data.Binary.Get. But as the type signature implies, surely the situation is more general than that.)

Edit: to clarify, I only want to carry out my effect when the Bool is False. In my case, the effect is reading from the network and the Bool tells whether there is data or not.

Thanks --Gareth

submitted by garethrowlands
[link] [20 comments]
Categories: Incoming News

Lennart Augustsson: Haskell error reporting with locations

Planet Haskell - Fri, 04/04/2014 - 1:28am
Error reporting in GHC is not always the nicest. For example, I often develop code by using undefined as a placeholder for code I've not written yet. Here's a simple example:
import System.Environment main = do args <- getargs if null args then undefined else undefined Running this looks like this: $ ./H H: Prelude.undefined Which undefined caused that? Looking at the error message we have no idea. Wouldn't it be nice with some location information?

We can actually get location information by using Control.Exception.assert:

import Control.Exception(assert) import System.Environment main = do args <- getargs if null args then assert False undefined else assert False undefined Now running it is much more informative: $ ./H H: H.hs:7:9-14: Assertion failed How is assert able to report the location? If we dig deep enough we discover that it's because the ghc compiler contains a special hack to recognize this function and give it location information. A generalized hackIn a Haskell compiler that I've implemented I've taken this compiler hack and extended it so it can be used for any function.  It comes in two parts, location information and location transparent definitions. __LOCATION__The __LOCATION__ identifier is always defined and utterly magical. Its value is a string that describes the location of that very identifier. This is the very opposite of a referentially transparent name. In fact it's value varies with where it is placed in the code! So it's definitely not for purists. But I'm a practical man, so I sometimes have resort of the ugliness of reality. And in reality we want to report locations in errors.

Enough philosophy, here's an example:

main = do print __LOCATION__ print __LOCATION__ And running it prints: "test/Test.hs:2:11" "test/Test.hs:3:13" And to illustrate the impurity: main = do let loc = __LOCATION__ print loc print loc And running this: "test/Test.mu:2:15" "test/Test.mu:2:15" Location transparencyThe __LOCATION__ identifier gives the location of itself. This is of little use on its own. Imagine the definition we could give for undefined. Somewhere in the Prelude module it could say something like undefined = error ("undefined: " ++ __LOCATION__) But if we use this all that it will tell us is where the definition of undefined is, not where it was used.

To get the point of use instead of the definition I've introduced location transparent definitions. In a location transparent definition the __LOCATION__ identifier will not refer to its own position, but to the position of the reference to the definition. Location transparency is introduced with a pragma.

{-# LOCATIONTRANSPARENT undefined #-} undefined = error ("undefined: " ++ __LOCATION__) With this definition our initial example looks like this when we run it: undefined: test/H.hs:6:9 In fact, the real definition of undefined doesn't look like that. The __LOCATION__ identifier is only used in the definition of error, so it looks something like this: {-# LOCATIONTRANSPARENT error #-} error :: String -> a error s = throw (ErrorCall (__LOCATION__ ++ ": " ++ s)) {-# LOCATIONTRANSPARENT undefined #-} undefined = error "undefined" Since both error and undefined are transparent any use of undefined will be reported with the location of the use.

Furthermore, we can make a few more functions location transparent, e.g.,

{-# LOCATIONTRANSPARENT head #-} head :: [a] -> a head [] = error "Empty list" head (x:xs) = x A simple example: main = putStr (head []) Which will print: test/Head.hs:1:16: Empty list which is the location where head was called. ImplementationThere are different ways to implement this feature, and I'm going to sketch two of them.

First: Every function that has the LOCATIONTRANSPARENT pragma will be inlined at the point of use, and the __LOCATION__ identifier in the inlined code will be updated to reflect the call site. The definitions must be processed in a bottom-up fashion for this to work. It's fairly simple to implement, but will cause some code bloat due to inlining.

Second: Every function that has LOCATIONTRANSPARENT pragma will be rewritten (by the compiler) to have an extra location argument, and each use of this function will be rewritten to pass in the current location. For example (using $$f for the location version of f):

main = putStr ($$head __LOCATION__ []) $$head __LOCATION__ [] = $$error __LOCATION__ "Empty list" $$head __LOCATION__ (x:xs) = x $$error __LOCATION__ s = throw (ErrorCall (__LOCATION__ ++ ": " ++ s)) This should be fairly straightforward to implement, but I've not tried it. (It's somewhat like dynamic binding, so maybe ghc could reuse that mechanism for locations.)

And, of course, the global __LOCATION__ identifier has to be recognized by the compiler and replaced by a string that is its location.

ConclusionI implemented the __LOCATION__ hack quite a while ago, and I like the much improved reporting of error locations. I hope someone will add it to ghc as well.
Categories: Offsite Blogs

.NET Compiler Platform ("Roslyn")

Lambda the Ultimate - Fri, 04/04/2014 - 12:21am

The .NET Compiler Platform (Roslyn) provides open-source C# and Visual Basic compilers with rich code analysis APIs. You can build code analysis tools with the same APIs that Microsoft is using to implement Visual Studio!

In a nutshell: OPEN SOURCE C# COMPILER. Putting aside possible practical implications of this for the .NET ecosystem, I think it is good for programming language geeks to be able to peruse the source code for compilers and language tools.

For the debate about MS being evil, you can head directly to HN where you'll also find an explanation of what bootstrapping a compiler means.

Categories: Offsite Discussion

Future of Programming workshop

Lambda the Ultimate - Thu, 04/03/2014 - 5:36pm

The call for submissions is out. There will be two opportunities this first year to participate: at Strangeloop in September and at SPLASH in October. The call:

We are holding two events. First, in partnership with the Emerging Languages Camp, FPW×ELC will be at Strange Loop on September 17 in St. Louis MO. Second, at SPLASH in Portland OR around Oct. 19 (pending approval).

We want to build a community of researchers and practitioners exploring the frontiers of programming. We are looking for new ideas that could radically improve the practice of programming. Ideas too embryonic for an academic paper yet developed enough to demonstrate a prototype. Show us your stuff!

FPW×ELC will present live demonstrations before an audience. The SPLASH event will be an intense, private writer’s workshop1,2. This process will be a chance to give and take both creative support and incisive criticism.

Submissions will be 15 minute demo screencasts. You can select either or both of the events in your submission. The submission deadline is June 8 and notifications will be sent June 27. After the events participants will have until December 1 to revise their screencasts for archival publication on our website. The submission site is now open. For questions please see the FAQ or ask info@future-programming.org.

Brought to you by Richard Gabriel, Alex Payne, and Jonathan Edwards.

This is a good idea for the more edgy language designers to exhibit their work and receive useful critique to improve presentation, which ultimately helps with our goal of world domination (or at least, pushing the community to take more risks).

Categories: Offsite Discussion

www.youtube.com

del.icio.us/haskell - Thu, 04/03/2014 - 5:25pm
Categories: Offsite Blogs

www.youtube.com

del.icio.us/haskell - Thu, 04/03/2014 - 5:25pm
Categories: Offsite Blogs

Restricted monads for free using Codensity + Constraint Kinds

Haskell on Reddit - Thu, 04/03/2014 - 1:47pm

Code here: http://lpaste.net/102201

I wanted to hack up a quick probability distribution monad, but using [(a,Rational)] bugged me because it didn't group identical events.

So here is the data I want to be a monad:

newtype ProbData a = ProbD { runProbD :: M.Map a Rational } deriving (Show,Eq)

It admits a form of return and bind, but it's not a monad:

retProb :: Ord a => a -> ProbData a retProb a = ProbD $ M.singleton a 1 bindProb :: Ord b => ProbData a -> (a -> ProbData b) -> ProbData b bindProb m k = ProbD $ M.unionsWith (+) $ map (\(a,p) -> M.map (*p) (runProbD $ k a)) $ M.toList $ runProbD m

However, we can get some magic: the Codensity monad transformer doesn't actually require a monad to transform!

newtype Codensity m a = Cod { runCod :: forall r. (a -> m r) -> m r } -- same text as ContT's implementation instance Monad (Codensity m) where return a = Cod (\k -> k a) m >>= f = Cod $ \k -> runCod m $ \a -> runCod (f a) k

However, this doesn't allow us to implement liftProb, so we have no way to get our distributions in:

-- liftProb :: ProbData a -> Codensity ProbData a -- liftProb pd = Cod $ \k -> bindProb pd k -- ERROR: No instance for Ord b ...

On the other hand, ConstraintKinds gives us a restricted version of the codensity monad that is still a monad regardless of what data we want to put into it:

-- a slight tweak from above: newtype RCodensity c m a = Cod { runCod :: forall r. c r => (a -> m r) -> m r } -- copy and paste the same monad instance and it works just fine! liftProb :: ProbData a -> RCodensity Ord ProbData a liftProb pd = Cod $ \k -> bindProb pd k -- compiles fine! runProb :: Ord a => RCodensity Ord ProbData a -> ProbData a runProb p = runCod p retProb

So now we can get yummy monadic syntax out of it:

type Prob a = RCodensity Ord ProbData a uniform :: Ord a => [a] -> Prob a uniform xs = liftProb $ ProbD $ M.fromList $ map (\a -> (a,p)) xs where p = 1 / fromIntegral (length xs) dice n d = liftM sum $ replicateM n (uniform [1..d])

However, this is still not ideal; dice 2 6 >>= whatever will call whatever 36 times, not 11. We can fix that, though:

optimize :: Ord a => Prob a -> Prob a optimize = liftProb . runProb twod6 = optimize (dice 2 6)

In twod6 >>= whatever, whatever will only get called 11 times, once for each value between 2 and 12!

submitted by ryani
[link] [15 comments]
Categories: Incoming News