The move-to-front (MTF) Transform

10/11/2009

To follow up my last post about the Burrows-Wheeler Transform, I decided to implement another simple algorithm which is often used after the BWT to help consolidate localized redundancy in the data before entropy encoding.

The idea behind the Move-to-Front algorithm is that we start with some known alphabet (like the list of ASCII characters), and encode our data elements as the index into that alphabet list of each element. The trick is that after each character is encoded, the alphabet list is modified by moving the element whose index we just looked up to the front of the alphabet.

Here are my implementations of encode and decode:


import Data.List


mtf :: (Eq a, Bounded a, Enum a)=> [a] -> Maybe [Int]
mtf = sequence . snd . mapAccumL enc [ minBound.. ]
    where enc l x = (x:delete x l, elemIndex x l)


mtfD :: (Eq a, Bounded a, Enum a)=> [Int-> [a]
mtfD = snd . mapAccumL dec [ minBound.. ]
    where dec l i = let x = l !! i 
                     in (x:delete x l, x)

The mapAccumL function is perfect for passing the state of the dictionary list. Another point of interest to mention is the use of minBound to let the function be polymorphic over any type for which there is a defined order and lowest element.

For example, we can do this:

*Main> mtf “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES” >>= return . mtfDec :: Maybe String
Just “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES”

or we can decode to Word8 bytes, if we want to get back the ASCII character in that form, just by specifying a different return type:

*Main> :m + Data.Word
*Main Data.Word> mtf “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES” >>= return . mtfDec :: Maybe [Word8]
Just [83,73,88,46,77,73,88,69,68,46,80,73,88,73,69,83, 46,83,73,70,84,46,83,73,88,84,89,46,80,73,88,73,69,46, 68,85,83,84,46,66,79,88,69,83]

Let me know your thoughts!

2 Comments

Cracking a Lock in Haskell with the De Bruijn sequence, pt. 2

29/09/2009

For this post I will rework the Prefer One algorithm from
the previous post
so that it is much more efficient. We will
add words to a Patricia Tree-like dictionary as we see them,
passing the tree along in the State monad; to check if a new
word has been seen we simply check in the tree, rather than
in the array.

First, a little boilerplate…


> module Main
>     where
>
> import Data.Array
> import Data.List(isInfixOf, tails)
>
> import Control.Monad.State
> import Control.Arrow

NEW IMPLEMENTATION:

———————————————————————-

In the previous implementation, to check if the word formed
by adding a one has been seen, we had to iterate through each
of the previous bits in the array, checking words.

For example for words of length 3, finding the 7th bit (?)
by checking if 111 has already been seen:

        /-----\  ==>  111
0 0 0 1 1 1 (1)...
\----/         000 == 111  No
  \----/       001 == 111  No
    \----/     011 == 111  No
      \----/   111 == 111  Yes, so this bit must be (0)

This is extremely inefficient. What we want is to be able to
store all the previous words that we’ve encountered in an easily-
searchable data structure.

In the example above, we would like the three words that we’ve
seen to be stored in what might be called a Trie, so that our
search instead looks like the following:

       /\
      0  1         1 - in tree, go right
     / \  \
    0   1  1       1 - in tree, go right
   / \   \  \
  0   1   1  1     1 - in tree, we've already seen 111,
                       so the last bit must be 0

Our new data structure will look like this:


> data Tree = Bs Tree Tree  -- Bs (zero_bit) (one_bit)
>           | X -- incomplete word
>           | B -- final bit of word
>           deriving Show

We’ll need to build a new tree from a list of bits, appending
a final bit (1, except for the initial tree):


> treeWithFinal1 = mkTree True
> treeWithFinal0 = mkTree False
>
>
> mkTree :: Bit -> [Bit] -> Tree
> mkTree p = foldr mkBranch (if p then Bs X B else Bs B X)  
>     where mkBranch b | b         = Bs X      --1
>                      | otherwise = flip Bs X --0


Finally, here is our new algorithm. The tree is passed in the
State monad, through the use of mapM. The state monad is a little
tricky sometimes:


> preferOneV2 :: Int -> [ Bit ]
> preferOneV2 n = 
>     let upB = 2^n
>         -- the whole bit sequence (one period):
>         bs =  take upB (replicate n False ++ bs')
>         (wp0:wordPrefixes) = [ take (n-1) w | w <- tails bs ]
>        
>         -- pass our Tree around in the State monad
>         state0 = treeWithFinal0 wp0
>        
>         -- thisBit is partially applied, after which we wrap the
>         -- function in a State constructor to make our :: m a
>         bsM'  = mapM (State . thisBit) wordPrefixes
>         (bs',tree) = runState bsM' state0
>
>       -- an infinite stream is returned... because I can:          
>    in cycle bs


With the following function, after we apply it to the word we’re searching
for, it becomes a function :: state -> (val,state), suitable for the
State monad:

Takes a list of the last n-1 Bits (Bools) and traverses a Tree which we’ve
been using to keep track of the words we’ve already seen. We fold the Bit
list into the tree. When we get to the endo of the list, we look for a One.
We return the new bit as well as the new Tree:


> thisBit :: [ Bit ] -> Tree -> (Bit, Tree)

We’re at a Zero bit,


> thisBit (False:bs) (Bs X o) = (True, Bs (treeWithFinal1 bs) o) -- last bit must be 1
> thisBit (False:bs) (Bs z o) = (id *** flip Bs o) (thisBit bs z)

…a One bit,


> thisBit (True:bs) (Bs z X) = (True, Bs z (treeWithFinal1 bs)) 
> thisBit (True:bs) (Bs z o) = (id *** Bs z) (thisBit bs o)

…or else propose a One for the last bit:


> thisBit [] (Bs _ o) = (b , (Bs z B)) 
>     -- we know that if the One bit has been seen (B) then we must
>     -- place a zero. we assume then that the Zero bit is (X):
>     where (b, z) = case o of 
>                         -- this bit = 1, Zero branch = nil
>                         X -> (True,  X) -- 1
>                         -- this bit = 0, Zero branch = last word bit
>                         _ -> (False, B) -- 0


TEST FUNCTIONS:

———————————————————————-

This code is copied from the previous post:

We use Bool for bits, where False ==> 0, True ==> 1:


> type Bit = Bool

Our garage-door lock model for testing the function:


> type Combo    = [ Bit ]
> type Receiver = Combo -> Bool

True means access granted:


> programReceiver :: Combo -> Receiver
> programReceiver = isInfixOf 

Test out our function:


> main =  let secretCode = [True,True,False,False,True,
>                           False,True,False,True,True]
>             receiver = programReceiver secretCode
>             crackingStream = preferOneV2 10
>
>          in if receiver crackingStream
>             then print "WE'RE IN!"
>             else print "...bugs"
>

Stay tuned for one more post on these algorithms.

2 Comments

Proposed modification to ‘array’ function in Data.Array

22/09/2009

In Data.Array the function array :: (IArray a e, Ix i) => (i, i) -> [(i, e)] -> a i e takes the array bounds and an association list, and it produces an Array. The function allows you to build an Array by providing the elements in whatever order you choose, rather than from the first index to the last.

The function listArray is similar except that it assumes that the list is already ordered by index so there is no need to provide tuples of (index, value). Becase listArray actually uses zip internally to produce the tupled list, so the programmer can pass a list to listArray which exceeds the declared bounds of the array and the function will quietly drop the extra list tail:

Prelude Data.Array> listArray (1,3) ['a','b','c',undefined]
array (1,3) [(1,'a'),(2,'b'),(3,'c')]

But when a list that exceeds the specified bound is passed to array, it dies a horrible death:

Prelude Data.Array> array (1,3) [(1,'a'),(2,'b'),(3,'c'),(4,'d')]
array *** Exception: Error in array index

I think the array function should be modified by adding a simple take to automatically drop any excess list elements:


array (l,u) ies
    = let n = safeRangeSize (l,u)
      in unsafeArray (l,u)
                     [(safeIndex (l,u) n i, e) | (i, e) <- take n ies]

This change would bring the function into better alignment with other prelude functions which expect infinite lists and drop tails as needed. It can be argued that the programmer might want to know that he is passing too long a list to his function, in which case I think the safeRangeSize function, etc. should be eliminated.

No Comments