# News aggregator

### Incremental sequence sorting

### Proposed significant breaking changes to Win32

### Generation and parsing of English numerals (cardinal and ordinal)

I need to use the English numerals (American) in Haskell, so I looked for a library that did it. I have not found one, therefore I developed the program that I am presenting here. There are actually two versions of the program: one "analytical" and the other "synthetic".

The analytical version (presented here) aims to represent the deep complex structure of the numerals and the close relation between cardinal and ordinal. The synthetic version is a simplification of the analytic version.

Since I am neither a linguist nor a native English speaker, so I would first need an assessment of the soundness of the analysis and representation of numerals.

I believe that the program can be more concise, but I have no idea how to proceed.

module EnglishNumerals (toEngCard ,toEngCardOrd ,toEngOrd ,fromEngCardinal ,fromEngOrdinal) where import Text.Parsec import Text.Parsec.Char import Control.Monad (msum) import Data.List (delete,elemIndex,isInfixOf,isSuffixOf) import Data.Maybe (fromJust) ----- ENGLISH NUMERAL (American) ----- {- EXAMPLES toEngCard 703012832745 == "seven hundred three billion twelve million eight hundred thirty-two thousand seven hundred forty-five" fromEngCardinal "seven hundred three billion twelve million eight hundred thirty-two thousand seven hundred forty-five" == 703012832745 map toEngCardOrd [0 .. 24] == ["0th","1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th","11th","12th","13th","14th","15th","16th","17th","18th","19th","20th","21st","22nd","23rd","24th"] map fromEngOrdinal ["0th","1st","2nd","3rd","4th","5th","6th","7th","8th","9th","10th","11th","12th","13th","14th","15th","16th","17th","18th","19th","20th","21st","22nd","23rd","24th"] == [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24] toEngOrd 703012832745 == "seven hundred three billion twelve million eight hundred thirty-two thousand seven hundred forty-fifth" fromEngOrdinal "seven hundred three billion twelve million eight hundred thirty-two thousand seven hundred forty-fifth" == 703012832745 -} --------------- F R O M I N T E G E R T O C A R D I N A L N U M B E R ------------------ toEngCard :: Integer -> String toEngCard n | n < 0 = error "Negative number." | n < 100 = toEngCardTill99 n | otherwise = toEngCardFrom100To999Trillion n toEngCardTill99 :: Integer -> String toEngCardTill99 n | n < 10 = engCardUnit !! fromInteger n | n < 20 = engCardinalTeen n | n < 100 = let t = tens n ; d = mod n 10 in engCardTens t ++ if d == 0 then "" else "-" ++ engCardUnit !! fromInteger d toEngCardFrom100To999Trillion :: Integer -> String toEngCardFrom100To999Trillion n | n < 10^3 = f n 100 "hundred" | n < 10^6 = f n (10^3) "thousand" | n < 10^9 = f n (10^6) "million" | n < 10^12 = f n (10^9) "billion" | n < 10^15 = f n (10^12) "trillion" | otherwise = error "About " ++ show n ++ " .. work in progress :)" where f x y s = let (q,r) = divMod x y in toEngCard q ++ " " ++ s ++ if r == 0 then "" else " " ++ toEngCard r engCardUnit = ["zero","one","two","three","four","five","six","seven","eight","nine"] irregularRoot :: Integer -> String irregularRoot n = case n of 2 -> init (engCardUnit !! 2) ++ "e" -- "twe" 3 -> take 2 (engCardUnit !! 3) ++ "ir" -- "thir" 4 -> delete 'u' (engCardUnit !! 4) -- "for" 5 -> take 2 (engCardUnit !! 5) ++ "f" -- "fif" 8 -> init (engCardUnit !! 8) -- "eigh" 9 -> init (engCardUnit !! 9) -- "nin" 20 -> irregularRoot 2 ++ "n" -- "twen" _ -> error "Irregular root not defined" twe = irregularRoot 2 thir = irregularRoot 3 for = irregularRoot 4 fif = irregularRoot 5 eigh = irregularRoot 8 nin = irregularRoot 9 twen = irregularRoot 20 irregularRoots = [twe,thir,for,fif,eigh,nin,twen] engCardinalTeen :: Integer -> String engCardinalTeen n | n == 10 = "ten" | n == 11 = "eleven" | n == 12 = twe ++ "lve" | otherwise = case n of 13 -> thir 15 -> fif 18 -> eigh _ -> toEngCard (n - 10) ++ "teen" engCardTens :: Integer -> String engCardTens n = [twen,thir,for,fif,toEngCard 6,toEngCard 7,eigh,toEngCard 9] !! fromInteger (n-2) ++ "ty" tens :: Integer -> Integer tens m = mod (div m 10) 10 --------------- F R O M I N T E G E R T O O R D I N A L N U M B E R ------------------- toEngCardOrd :: Integer -> String -- Concise Ordinal toEngCardOrd n | n < 0 = error "Negative number." | otherwise = show n ++ if n >= 11 && n <= 13 then "th" else suff where suff = case mod n 10 of 1 -> "st" 2 -> "nd" 3 -> "rd" _ -> "th" toEngOrd :: Integer -> String -- Verbose Ordinal toEngOrd n | n < 0 = error "Negative number." | n < 100 = engVerbOrdTill99 n | otherwise = engVerbOrdFrom1000Up n engVerbOrdTill99 n | elem n [0,4,6,7] = toEngCard n ++ "th" | n == 1 = "first" | n == 2 = "second" | n == 3 = thir ++ "d" | n < 10 = irregularRoot n ++ "th" | n == 12 = twe ++ "lf" ++ "th" | n < 20 = toEngCard n ++ "th" | n < 100 = let t = tens n ; u = mod n 10 in if u == 0 then init (engCardTens t) ++ "ieth" else (engCardTens t) ++ "-" ++ toEngOrd u | otherwise = error "Number not between 0 and 99: " ++ show n engVerbOrdFrom1000Up n = toEngCard h ++ if r == 0 then "th" else " " ++ toEngOrd r where r = rem n 100 h = 100 * div n 100 -- hundreds ---------------- P A R S I N G C A R D I N A L N U M B E R S --------------- fromEngCardinal :: String -> Integer fromEngCardinal s = case parse parseCardinalNumber "" s of Left xs -> error $ show xs Right n -> n parseCardinalNumber :: Parsec String u Integer parseCardinalNumber = do many space do eof; return 0 <|> do n1 <- parseFrom0To999 try (do eof; return n1) <|> do spaces n2 <- parseMultiplier let n3 = n1 * n2 try (do eof; return n3) <|> do n4 <- parseCardinalNumber; return (n3 + n4) parseFrom0To999 = try parseFrom100To999 <|> parseUpTo99 parseFrom100To999 = do n1 <- parseHundreds try (do spaces; n2 <- parseUpTo99; return $ n1 + n2) <|> return n1 parseHundreds = do n <- parseDigit; spaces; string "hundred"; return $ n * 100 parseUpTo99 = do n <- try parseTensHyphenDigit <|> try parseTens <|> try parseTeen <|> parseDigit return n parseMultiplier = try thousand <|> million <|> billion <|> trillion thousand = string "thousand" >> return (10^3) million = string "million" >> return (10^6) billion = string "billion" >> return (10^9) trillion = string "trillion" >> return (10^12) parseDigit = do s <- tryStrings engCardUnit; return $ index s engCardUnit parseTeen = try parseTeenIrregular1 <|> parseTeenIrregular2 <|> parseTeenRegular parseTeenRegular = do n <- parseDigit; string "teen"; return $ 10 + n parseTeenIrregular1 = do d <- tryStrings ectn; return $ 10 + index d ectn where ectn = ["ten","eleven",twe ++ "lve"] parseTeenIrregular2 = do d <- tryStrings ectn; string "teen"; return $ v d where ectn = [thir, fif, eigh] v x = fromJust $ lookup x [(thir,13),(fif,15),(eigh,18)] parseTens = try parseTensIrregular <|> parseTensRegular parseTensIrregular = do s <- tryStrings prefTens; string "ty"; return $ v s where prefTens = [twe ++ "n",thir,for,fif,eigh] v x = fromJust $ lookup x [(twe ++ "n",20),(thir,30),(for,40),(fif,50),(eigh,80)] parseTensRegular = do n <- parseDigit; string "ty"; return $ n * 10 parseTensHyphenDigit = do n1 <- parseTens; char '-'; n2 <- parseDigit; return $ n1 + n2 bigCardinals = ["hundred","thousand","million","billion","trillion"] -- ------------- P A R S I N G O R D I N A L N U M B E R S --------------- fromEngOrdinal :: String -> Integer fromEngOrdinal s = case parse parseOrdinalNumber "" s of Left xs -> error $ show xs Right n -> n parseOrdinalNumber :: Parsec String () Integer parseOrdinalNumber = parseConciseOrdinalNumber <|> parseVerboseOrdinalNumber parseConciseOrdinalNumber = do ds <- many1 digit suf <- tryStrings ["st","nd","rd","th"] eof if agreement ds suf then return (read ds :: Integer) else error "There is no agreement between digits and suffix" agreement ds suf | or (zipWith isSuffixOf ["11","12","13"] (repeat ds)) = suf == "th" | isSuffixOf "1" ds = suf == "st" | isSuffixOf "2" ds = suf == "nd" | isSuffixOf "3" ds = suf == "rd" | otherwise = suf == "th" parseVerboseOrdinalNumber = do many space do eof; return 0 <|> try parseOrdinalDigit <|> try parseOrdinalTeenRegular <|> try parseOrdinal12 <|> try parseOrdinal20 <|> try parseOrdinalTens <|> try parseOrdinalTensWithCardinalPrefix <|> try parseOrdinalHundreds <|> do s <- getInput if isLastWordHypenate s then parseOrdinalWithCardinalPrefixAndLastNumberHyphenate s else parseRemainingOrdinals parseOrdinalDigit = do s <- tryStrings eod; return $ index s eod where eod = map toEngOrd [0..9] parseOrdinalTeenRegular = do n <- parseTeen; string "th"; return n parseOrdinal12 = do string (twe ++ "lf" ++ "th"); return 12 parseOrdinal20 = do string (twe ++ "n" ++ "tieth"); return 20 parseOrdinalTens = do s <- tryStrings eots; string "tieth"; return $ 10 * (2 + index s eots) where eots = [twen,thir,for,fif,toEngCard 6,toEngCard 7,eigh,toEngCard 9] parseOrdinalTensWithCardinalPrefix = do n <- parseTens; char '-'; n2 <- parseOrdinalDigit; return (n + n2) parseOrdinalHundreds = do n <- parseDigit; space; string ("hundred" ++ "th"); return (n * 100) parseOrdinalWithCardinalPrefixAndLastNumberHyphenate s = do let (s1,_:s2) = span (/= '-'). reverse $ s let eoc = case parse parseCardinalNumber "" (reverse s2) of Left xs -> error $ show xs Right n -> n let eon = case parse parseOrdinalDigit "" (reverse s1) of Left xs -> error $ show xs Right n -> n return (eoc + eon) parseRemainingOrdinals = do inp <- getInput let ws = words inp let eon = last ws let ecn = unwords . init $ ws if elem eon bigOrdinals then do let inp2 = take (length inp - 2) inp let rn = case parse parseCardinalNumber "" inp2 of Left xs -> error $ show xs Right n -> n return rn else do let rn1 = case parse parseCardinalNumber "" ecn of Left xs -> error $ show xs Right n -> n let rn2 = case parse parseVerboseOrdinalNumber "" eon of Left xs -> error $ show xs Right n -> n return (rn1 + rn2) bigOrdinals = map (++ "th") bigCardinals -- -------------- P A R S I N G S T U F F -------------- -- UTILITY functions index :: Eq a => a -> [a] -> Integer index x xs = fromIntegral . fromJust $ elemIndex x xs isLastWordHypenate :: String -> Bool isLastWordHypenate = isInfixOf "-" . last . words -- UTILITY parsers tryStrings :: [String] -> Parsec String u String tryStrings = msum . fmap (try . string) submitted by acapi[link] [14 comments]

### Mark Jason Dominus: An ounce of theory is worth a pound of search

The computer is really awesome at doing quick searches for numbers with weird properties, and people with an amateur interest in recreational mathematics would do well to learn some simple programming. People appear on math.stackexchange quite often with questions about tic-tac-toe, but there are only 5,478 total positions, so any question you want to ask can be instantaneously answered by an exhaustive search. An amateur showed up last fall asking “Is it true that no prime larger than 241 can be made by either adding or subtracting 2 coprime numbers made up out of the prime factors 2,3, and 5?” and, once you dig through the jargon, the question is easily answered by the computer, which quickly finds many counterexamples, such as and .

But sometimes the search appears too large to be practical, and then you need to apply theory. Sometimes you can deploy a lot of theory and solve the problem completely, avoiding the search. But theory is expensive, and not always available. A hybrid approach often works, which uses a tiny amount of theory to restrict the search space to the point where the search is easy.

One of these I wrote up on this blog back in 2006:

A number is excellent if it has an even number of digits, and if when you chop it into a front half and a back half , you have . For example, is excellent, because , and is excellent, because .

The programmer who gave me thie problem had tried a brute-force search over all numbers, but to find all 10-digit excellent numbers, this required an infeasible search of 9,000,000,000 candidates. With the application of a tiny amount of algebra, one finds that and it's not hard to quickly test candidates for to see if has this form and if so to find the corresponding value of . (Details are in the other post.) This reduces the search space for 10-digit excellent numbers from 9,000,000,000 candidates to 90,000, which could be done in under a minute even with last-century technology, and is pretty nearly instantaneous on modern equipment.

But anyway, the real point of this note is to discuss a different problem entirely. A recreational mathematician on stackexchange wanted to find distinct integers for which and were all perfect squares. You can search over all possible quadruples of numbers, but this takes a long time. The querent indicated later that he had tried such a search but lost patience before it yielded anything.

Instead, observe that if is a perfect square then and are the legs of a right triangle with integer sides; they are terms in what is known as a Pythagorean triple. The prototypical example is , and is the Pythagorean triple. (The querent was quite aware that he was asking for Pythagorean triples, and mentioned them specifically.)

Here's the key point: It has been known since ancient times that if is a Pythagorean triple, then there exist integers and such that: $$\begin{align} \require{align} a & = n^2-m^2 \\ b & = 2mn \\ c & = n^2 + m^2 \end{align}$$

So you don't have to search for Pythagorean triples; you can just generate them with no searching:

for my $m (1 .. 200) { for my $n ($m+1 .. 200) { my $a = $n*$n-$m*$m; my $b = 2 * $n * $m; $trip{$a}{$b} = 1; $trip{$b}{$a} = 1; } }This builds a hash table, %trip, with two important properties:

$trip{$a} is a sub-table whose keys are all the numbers that can form a triple with . For example, $trip{20} is a hash with three keys: 21, 48, and 99, because and , but 20 is not a participant in any other triples.

$trip{$a}{$b} is true if and only if is a perfect square, and false otherwise.

The table has only around 40,000 entries. Having constructed it, we now search it:

for my $a (keys %trip) { for my $b (keys %{$trip{$a}}) { for my $c (keys %{$trip{$b}}) { next if $c == $a; for my $d (keys %{$trip{$c}}) { next if $d == $b; print "$a $b $c $d\n" if $trip{$d}{$a}; } } } }The outer loop runs over each that is known to be a member of a
Pythagorean triple. (Actually the formulas show that
every number bigger than 2 is a member of *some*
triple, but we may as well skip the ones that are only in triples we
didn't tabulate.) Then the next loop runs over every that can
possibly form a triple with ; that is, every for which
is a perfect square. We don't have to search for them; we
have them tabulated ahead of time. Then for each such (and
there aren't very many) we run over every that forms a triple
with , and again there is no searching and very few candidates.
Then then similarly , and if the we try forms a triple with
, we have a winner.

The next if $c == $a and next if $d == $b tests are to rule out trivial solutions like , which the querent wasn't interested in anyway. We don't have to test for equality of any of ther other pairs because no number can form a Pythagorean triple with itself.

This runs in less than a second on so-so hardware and produces 11 solutions:

3472 7296 10400 2175 4312 23520 12008 465 6512 9984 800 6375 12312 666 1288 8415 14592 6944 4350 20800 16830 2576 1332 24624 19968 13024 12750 1600 25500 26048 39936 3200 30192 6175 2400 9856 41600 29184 13888 8700 47040 8624 930 24016Only five of these are really different. For example, the last one is the same as the second, with every element multiplied by 2; the third, seventh, and eighth are similarly the same. In general if is a solution, so is for any . A slightly improved version would require that the four numbers not have any common factor greater than 1; there are few enough solutions that the cost of this test would be completely negligible.

The only other thing wrong with the program is that it produces each solution 8 times; if is a solution, then so are and so on. This is easily fixed with a little post-filtering; pipe the output through

perl -nle '$k = join " ", sort { $a <=> $b } split; print unless $seen{$k}++ 'or something of that sort.

The corresponding run with and up to 2,000 instead of only 200 takes 5 minutes and finds 445 solutions, of which 101 are distinct, including . It would take a very long time to find this with a naïve search.

[ For a much larger and more complex example of the same sort of thing, see When do and have the same digits?. I took a seemingly-intractable problem and analyzed it mathematically. I used considerably more than an ounce of theory in this case, and while the theory was not enough to solve the problem, it was enough to reduce the pool of candates to the point that a computer search was feasible. ]

### Keegan McAllister: html5ever project update: one year!

I started the html5ever project just a bit over one year ago. We adopted the current name last July.

<kmc> maybe the whole project needs a better name, idk<Ms2ger> htmlparser, perhaps

<jdm> tagsoup

<Ms2ger> UglySoup

<Ms2ger> Since BeautifulSoup is already taken

<jdm> html5ever

<Ms2ger> No

<jdm> you just hate good ideas

<pcwalton> kmc: if you don't call it html5ever that will be a massive missed opportunity

By that point we already had a few contributors. Now we have 469 commits from 18 people, which is just amazing. Thank you to everyone who helped with the project. Over the past year we've upgraded Rust almost 50 times; I'm extremely grateful to the community members who had a turn at this Sisyphean task.

Several people have also contributed major enhancements. For example:

Clark Gaebel implemented zero-copy parsing. I'm in the process of reviewing this code and will be landing pieces of it in the next few weeks.

Josh Matthews made it possible to suspend and resume parsing from the tree sink. Servo needs this to do async resource fetching for external <script>s of the old-school (non-async/defer) variety.

Chris Paris implemented fragment parsing and improved serialization. This means Servo can use html5ever not only for parsing whole documents, but also for the innerHTML/outerHTML getters and setters within the DOM.

Adam Roben brought us dramatically closer to spec conformance. Aside from foreign (XML) content and <template>, we pass 99.6% of the html5lib tokenizer and tree builder tests! Adam also improved the build and test infrastructure in a number of ways.

I'd also like to thank Simon Sapin for doing the initial review of my code, and finding a few bugs in the process.

html5ever makes heavy use of Rust's metaprogramming features. It's been something of a wild ride, and we've collaborated with the Rust team in a number of ways. Felix Klock came through in a big way when a Rust upgrade broke the entire tree builder. Lately, I've been working on improvements to Rust's macro system ahead of the 1.0 release, based in part on my experience with html5ever.

Even with the early-adopter pains, the use of metaprogramming was absolutely worth it. Most of the spec-conformance patches were only a few lines, because our encoding of parser rules is so close to what's written in the spec. This is especially valuable with a "living standard" like HTML.

The futureTwo upcoming enhancements are a high priority for Web compatibility in Servo:

Character encoding detection and conversion. This will build on the zero-copy UTF-8 parsing mentioned above. Non-UTF-8 content (~15% of the Web) will have "one-copy parsing" after a conversion to UTF-8. This keeps the parser itself lean and mean.

document.write support. This API can insert arbitrary UTF-16 code units (which might not even be valid Unicode) in the middle of the UTF-8 stream. To handle this, we might switch to WTF-8. Along with document.write we'll start to do speculative parsing.

It's likely that I'll work on one or both of these in the next quarter.

Servo may get SVG support in the near future, thanks to canvg. SVG nodes can be embedded in HTML or loaded from an external XML file. To support the first case, html5ever needs to implement WHATWG's rules for parsing foreign content in HTML. To handle external SVG we could use a proper XML parser, or we could extend html5ever to support "XML5", an error-tolerant XML syntax similar to WHATWG HTML. Ygg01 made some progress towards implementing XML5. Servo would most likely use it for XHTML as well.

Improved performance is always a goal. html5ever describes itself as "high-performance" but does not have specific comparisons to other HTML parsers. I'd like to fix that in the near future. Zero-copy parsing will be a substantial improvement, once some performance issues in Rust get fixed. I'd like to revisit SSE-accelerated parsing as well.

I'd also like to support html5ever on some stable Rust 1.*x*version, although it probably won't happen for 1.0.0. The main obstacle here is procedural macros. Erick Tryzelaar has done some great work recently with syntex, aster, and quasi. Switching to this ecosystem will get us close to 1.*x* compatibility *and* will clean up the macro code quite a bit. I'll be working with Erick to use html5ever as an early validation of his approach.

Simon has extracted Servo's CSS selector matching engine as a stand-alone library. Combined with html5ever this provides most of the groundwork for a full-featured HTML manipulation library.

The C API for html5ever still builds, thanks to continuous integration. But it's not complete or well-tested. With the removal of Rust's runtime, maintaining the C API does not restrict the kind of code we can write in other parts of the parser. All we need now is to complete the C API and write tests. This would be a great thing for a community member to work on. Then we can write bindings for every language under the sun and bring fast, correct, memory-safe HTML parsing to the masses :)

### can't nterrupt winGHCi or ghci

### [TFP'15] final call for papers - deadline extendedmarch 31 -

### [TFP'15] final call for papers - deadline extended march31 -

### Only 4 more likes required for Haskell Keycap Interest Checks!

If you haven't seen it before, I've put a Cherry MX keycap Signature Plastics group-buy here: http://www.pimpmykeyboard.com/deals/haskell-logo-keycaps/ for Haskell Thompson-Wheeler Keycaps.

If we get to 50 likes, we can convert this to a Group-Buy, making them available for sale! So, if you haven't already, a like would go a long way!

submitted by NFATracker[link] [2 comments]

### Adapting to GHC 7.10

I have a project that is fairly large for my PhD that I'm writing with Haskell. I'm excited about the changes in the new version of GHC and I would like to adapt as soon as possible to it, as I take lots of time to make things work. So I have a few questions:

1) Is there any reason for a code that works now just stop working with the new Prelude? Isn't it just more general? 2) If I adapt to the generality of the new prelude, do I lose something? My code is heavily dependant on Data.Vector.Storable, are these affected? What about the regular Data.Vector? 3) Is it possible to use a preview of the new Prelude to test it?

Thanks!

submitted by guaraqe[link] [6 comments]

### FP Complete: Stackage and GHC 7.10 update

As many of you saw, GHC 7.10 RC3 is now available. As has become a pretty long-standing tradition, the Stackage project has been tracking GHC 7.10 library support for a few months now. Given how close we are to a release, now seems like a good time to make a more public status report.

Firstly: restrictive upper bounds. I added a
comment two
hours ago listing all of the restrictive upper bounds, that is, upper bounds on
libraries which prevent them from working with GHC 7.10. (Note: --allow-newer
can override that.) In some cases, these are *unnecessary* upper bounds, in
that simply relaxing the bounds in the cabal file will allow them to build. In
others, they are in fact necessary, and the code needs to change. To help
identify which packages fall into which categories, I then did a full build
without bounds checking. I've uploaded three different sets of results:

- Summary of packages failures, ignoring dependency failures. These are packages that configured correctly, and then either failed on the build or test step. The good news is that
**only 64 packages failed**! - Summary of packages failures, including dependency failures. Same as above, plus packages that failed to configure because one of their dependencies didn't build. The bad news is that this list has 480 packages.
- A tarball containing all of the detailed build logs. I strongly encourage any authors with a package in the first list list, and even those with a package in the second list, to download this tarball and review the logs.

The Haskell community excels in getting libraries to quickly be compatible with the newest GHC version. Let's strive to do it again with this release!