The move-to-front (MTF) Transform

10/11/2009

To follow up my last post about the Burrows-Wheeler Transform, I decided to implement another simple algorithm which is often used after the BWT to help consolidate localized redundancy in the data before entropy encoding.

The idea behind the Move-to-Front algorithm is that we start with some known alphabet (like the list of ASCII characters), and encode our data elements as the index into that alphabet list of each element. The trick is that after each character is encoded, the alphabet list is modified by moving the element whose index we just looked up to the front of the alphabet.

Here are my implementations of encode and decode:


import Data.List


mtf :: (Eq a, Bounded a, Enum a)=> [a] -> Maybe [Int]
mtf = sequence . snd . mapAccumL enc [ minBound.. ]
    where enc l x = (x:delete x l, elemIndex x l)


mtfD :: (Eq a, Bounded a, Enum a)=> [Int-> [a]
mtfD = snd . mapAccumL dec [ minBound.. ]
    where dec l i = let x = l !! i 
                     in (x:delete x l, x)

The mapAccumL function is perfect for passing the state of the dictionary list. Another point of interest to mention is the use of minBound to let the function be polymorphic over any type for which there is a defined order and lowest element.

For example, we can do this:

*Main> mtf “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES” >>= return . mtfDec :: Maybe String
Just “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES”

or we can decode to Word8 bytes, if we want to get back the ASCII character in that form, just by specifying a different return type:

*Main> :m + Data.Word
*Main Data.Word> mtf “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES” >>= return . mtfDec :: Maybe [Word8]
Just [83,73,88,46,77,73,88,69,68,46,80,73,88,73,69,83, 46,83,73,70,84,46,83,73,88,84,89,46,80,73,88,73,69,46, 68,85,83,84,46,66,79,88,69,83]

Let me know your thoughts!

2 Comments

Polishing a Functional Pearl: The Burrows-Wheeler Transform

7/11/2009

Here is a quick post to get me back into the swing of blogging:

I was looking through an old post on StackOverflow about clever functional code, and the best answer, given by “yairchu” was a nice version of the Burrows-Wheeler Transform, which is an algorithm for permuting a string such that it can be compressed more effectively by other algorithms. The code posted was (import Data.List assumed):


bwp :: (Ord a)=> [a] -> [a]
bwp xs = map snd $ sort $ zip (rots xs) (rrot xs)
rots xs = take (length xs) (iterate lrot xs)
lrot xs = tail xs ++ [head xs]
rrot xs = last xs : init xs

I saw I could improve/shorten this in a couple of obvious ways and came up with this:


bwp :: (Ord a) => [a] -> [a]
bwp = map snd . sort . rots 
rots xs = zip (tail $ iterate lrot xs) xs
lrot (x:xs) = xs ++ [x]

Still unsatisfied and even more obsessed I came up with this final, prettiest version, before forcing myself to give it up already:


bwp :: (Ord a)=> [a] -> [a]
bwp = map snd . sort . rots 
rots xs =  zip (lrot xs) xs
lrot = tail . tails . cycle

Unfortunately, this last version will croak if your string happens to look like “111111″ or “cAbcAb” because sort will keep trying to compare infinites lists.

Update: I did a short post on the Move To Front transform as a follow-up to this post.

1 Comment

Cracking a Lock in Haskell with the De Bruijn sequence, pt. 2

29/09/2009

For this post I will rework the Prefer One algorithm from
the previous post
so that it is much more efficient. We will
add words to a Patricia Tree-like dictionary as we see them,
passing the tree along in the State monad; to check if a new
word has been seen we simply check in the tree, rather than
in the array.

First, a little boilerplate…


> module Main
>     where
>
> import Data.Array
> import Data.List(isInfixOf, tails)
>
> import Control.Monad.State
> import Control.Arrow

NEW IMPLEMENTATION:

———————————————————————-

In the previous implementation, to check if the word formed
by adding a one has been seen, we had to iterate through each
of the previous bits in the array, checking words.

For example for words of length 3, finding the 7th bit (?)
by checking if 111 has already been seen:

        /-----\  ==>  111
0 0 0 1 1 1 (1)...
\----/         000 == 111  No
  \----/       001 == 111  No
    \----/     011 == 111  No
      \----/   111 == 111  Yes, so this bit must be (0)

This is extremely inefficient. What we want is to be able to
store all the previous words that we’ve encountered in an easily-
searchable data structure.

In the example above, we would like the three words that we’ve
seen to be stored in what might be called a Trie, so that our
search instead looks like the following:

       /\
      0  1         1 - in tree, go right
     / \  \
    0   1  1       1 - in tree, go right
   / \   \  \
  0   1   1  1     1 - in tree, we've already seen 111,
                       so the last bit must be 0

Our new data structure will look like this:


> data Tree = Bs Tree Tree  -- Bs (zero_bit) (one_bit)
>           | X -- incomplete word
>           | B -- final bit of word
>           deriving Show

We’ll need to build a new tree from a list of bits, appending
a final bit (1, except for the initial tree):


> treeWithFinal1 = mkTree True
> treeWithFinal0 = mkTree False
>
>
> mkTree :: Bit -> [Bit] -> Tree
> mkTree p = foldr mkBranch (if p then Bs X B else Bs B X)  
>     where mkBranch b | b         = Bs X      --1
>                      | otherwise = flip Bs X --0


Finally, here is our new algorithm. The tree is passed in the
State monad, through the use of mapM. The state monad is a little
tricky sometimes:


> preferOneV2 :: Int -> [ Bit ]
> preferOneV2 n = 
>     let upB = 2^n
>         -- the whole bit sequence (one period):
>         bs =  take upB (replicate n False ++ bs')
>         (wp0:wordPrefixes) = [ take (n-1) w | w <- tails bs ]
>        
>         -- pass our Tree around in the State monad
>         state0 = treeWithFinal0 wp0
>        
>         -- thisBit is partially applied, after which we wrap the
>         -- function in a State constructor to make our :: m a
>         bsM'  = mapM (State . thisBit) wordPrefixes
>         (bs',tree) = runState bsM' state0
>
>       -- an infinite stream is returned... because I can:          
>    in cycle bs


With the following function, after we apply it to the word we’re searching
for, it becomes a function :: state -> (val,state), suitable for the
State monad:

Takes a list of the last n-1 Bits (Bools) and traverses a Tree which we’ve
been using to keep track of the words we’ve already seen. We fold the Bit
list into the tree. When we get to the endo of the list, we look for a One.
We return the new bit as well as the new Tree:


> thisBit :: [ Bit ] -> Tree -> (Bit, Tree)

We’re at a Zero bit,


> thisBit (False:bs) (Bs X o) = (True, Bs (treeWithFinal1 bs) o) -- last bit must be 1
> thisBit (False:bs) (Bs z o) = (id *** flip Bs o) (thisBit bs z)

…a One bit,


> thisBit (True:bs) (Bs z X) = (True, Bs z (treeWithFinal1 bs)) 
> thisBit (True:bs) (Bs z o) = (id *** Bs z) (thisBit bs o)

…or else propose a One for the last bit:


> thisBit [] (Bs _ o) = (b , (Bs z B)) 
>     -- we know that if the One bit has been seen (B) then we must
>     -- place a zero. we assume then that the Zero bit is (X):
>     where (b, z) = case o of 
>                         -- this bit = 1, Zero branch = nil
>                         X -> (True,  X) -- 1
>                         -- this bit = 0, Zero branch = last word bit
>                         _ -> (False, B) -- 0


TEST FUNCTIONS:

———————————————————————-

This code is copied from the previous post:

We use Bool for bits, where False ==> 0, True ==> 1:


> type Bit = Bool

Our garage-door lock model for testing the function:


> type Combo    = [ Bit ]
> type Receiver = Combo -> Bool

True means access granted:


> programReceiver :: Combo -> Receiver
> programReceiver = isInfixOf 

Test out our function:


> main =  let secretCode = [True,True,False,False,True,
>                           False,True,False,True,True]
>             receiver = programReceiver secretCode
>             crackingStream = preferOneV2 10
>
>          in if receiver crackingStream
>             then print "WE'RE IN!"
>             else print "...bugs"
>

Stay tuned for one more post on these algorithms.

2 Comments