Run-length Encoding

26/05/2009

Just a quick implementation of a RLE algorithm for lists in haskell. We compress a list by converting “runs” of consecutive elements into a tuple of the form: (run_length, element).


import Data.List (group)
import Control.Arrow


runLengthEnc :: Eq a => [a] -> [(Int,a)]
runLengthEnc = map (length &&& head. group

decode :: [(Int, a)] -> [a]
decode = concatMap (uncurry replicate)

If the &&& combinator looks foreign to you, check out David R. Maciver’s very enlightening blog about Arrow functions.

I’m always curious to see how naive-looking functions like the above compare in performance to a from-scratch implementation with explicit recursion, so I came up with the following:


runLengthEnc' :: Eq a => [a] -> [(Int,a)]
runLengthEnc' [] = []
runLengthEnc' (a:as) = run 1 a as
    where run n x [] = [(n,x)]
          run n x (x2:xss) | x == x2   = run (n+1) x xss
                           | otherwise = (n,x) : run 1 x2 xss

I tested both functions on a list of 100,000 random 1s and 0s and found the explicit version to be only marginally better performing, completing the list in 49 ticks & 130Mb, compared to 54 ticks & 139 Mb for the one-liner!

There are 2 comments in this article:

  1. 31/05/2009Huffman Coding in Haskell | LAMBDAPHONE says:

    [...] is the second of probably three posts (the first was on run-length encoding in Haskell) inspired by Thomas Guest’s interesting article on the Deflate algorithm. This is also my [...]

  2. 18/06/2009Lazy Arrays and the LZ77 Algorithm in Haskell | LAMBDAPHONE says:

    [...] my third post investigating compression techniques related to the DEFLATE algorithm: the first on run-length encoding, and the second on simple Huffman Coding. This post models the LZ77 algorithm, the second of the [...]

Write a comment: