The move-to-front (MTF) Transform

10/11/2009

To follow up my last post about the Burrows-Wheeler Transform, I decided to implement another simple algorithm which is often used after the BWT to help consolidate localized redundancy in the data before entropy encoding.

The idea behind the Move-to-Front algorithm is that we start with some known alphabet (like the list of ASCII characters), and encode our data elements as the index into that alphabet list of each element. The trick is that after each character is encoded, the alphabet list is modified by moving the element whose index we just looked up to the front of the alphabet.

Here are my implementations of encode and decode:


import Data.List


mtf :: (Eq a, Bounded a, Enum a)=> [a] -> Maybe [Int]
mtf = sequence . snd . mapAccumL enc [ minBound.. ]
    where enc l x = (x:delete x l, elemIndex x l)


mtfD :: (Eq a, Bounded a, Enum a)=> [Int-> [a]
mtfD = snd . mapAccumL dec [ minBound.. ]
    where dec l i = let x = l !! i 
                     in (x:delete x l, x)

The mapAccumL function is perfect for passing the state of the dictionary list. Another point of interest to mention is the use of minBound to let the function be polymorphic over any type for which there is a defined order and lowest element.

For example, we can do this:

*Main> mtf “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES” >>= return . mtfDec :: Maybe String
Just “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES”

or we can decode to Word8 bytes, if we want to get back the ASCII character in that form, just by specifying a different return type:

*Main> :m + Data.Word
*Main Data.Word> mtf “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES” >>= return . mtfDec :: Maybe [Word8]
Just [83,73,88,46,77,73,88,69,68,46,80,73,88,73,69,83, 46,83,73,70,84,46,83,73,88,84,89,46,80,73,88,73,69,46, 68,85,83,84,46,66,79,88,69,83]

Let me know your thoughts!

2 Comments

Polishing a Functional Pearl: The Burrows-Wheeler Transform

7/11/2009

Here is a quick post to get me back into the swing of blogging:

I was looking through an old post on StackOverflow about clever functional code, and the best answer, given by “yairchu” was a nice version of the Burrows-Wheeler Transform, which is an algorithm for permuting a string such that it can be compressed more effectively by other algorithms. The code posted was (import Data.List assumed):


bwp :: (Ord a)=> [a] -> [a]
bwp xs = map snd $ sort $ zip (rots xs) (rrot xs)
rots xs = take (length xs) (iterate lrot xs)
lrot xs = tail xs ++ [head xs]
rrot xs = last xs : init xs

I saw I could improve/shorten this in a couple of obvious ways and came up with this:


bwp :: (Ord a) => [a] -> [a]
bwp = map snd . sort . rots 
rots xs = zip (tail $ iterate lrot xs) xs
lrot (x:xs) = xs ++ [x]

Still unsatisfied and even more obsessed I came up with this final, prettiest version, before forcing myself to give it up already:


bwp :: (Ord a)=> [a] -> [a]
bwp = map snd . sort . rots 
rots xs =  zip (lrot xs) xs
lrot = tail . tails . cycle

Unfortunately, this last version will croak if your string happens to look like “111111″ or “cAbcAb” because sort will keep trying to compare infinites lists.

Update: I did a short post on the Move To Front transform as a follow-up to this post.

1 Comment

Enumerating all n-integer Sets Which Sum to an Integer x

30/07/2009

I came up with what I think is an elegant use of monadic code, while working on a program to compute solutions to the Postage Stamp Problem. The problem asks:

Consider that an envelope is able to hold a limited number of postage stamps. Then, consider the set of the values of the stamps — positive integers only. Then, what is the smallest total postage which *cannot* be put onto the envelope?

The algorithm below takes a number of stamps that can fit and a total postage to apply and returns a list of all the sets of different stamp values that can be combined to form the target postage sum:


> sumLists :: Int -> Int -> [[Int]]
> sumLists = f 0 where
>     f _ 1 n = return [n]  
>     f i c n = do a  <- [i.. div n c] 
>                  as <- f (i+a) (c-1) (n-a)      
>                  return (a : as) 

Thus the potential stamp combinations for a letter which can hold three stamps and costing 5 cents to mail would be:

*Main> sumLists 3 5
[[0,0,5],[0,1,4],[0,2,3],[1,1,3],[1,2,2]]

As you can see, the algorithm also produces ordered lists naturally!

But perhaps all we care about are unique stamp values that. If we want to adjust our goal such that the algorithm returns a list of all sets of unique stamp values sufficient to create the target postage, we need only adjust the return line:


>                  return (a : dropWhile (==a) as) 

Similarly we could drop the zeros (which mean “no stamp” in our case) from the beginnings of each set if we desired. The above code have applications to many similar problems including Subset Sum, and Knapsack as well as other partitioning problems.

No Comments