Archive of published articles on November, 2009

Back home

Find a permutation given its Inversion Table

15/11/2009

Here is some code I whipped up in response to an interesting problem posted on reddit.com/r/coding/. It was a really interesting algorithm to work out:



import Data.List (sortBy)

-- takes an inversion list and converts it to the list of permuted
-- elements:
decode = dec [] . sortBy lowHi . zip [1..]  where
    dec as []         = as
    dec as ((a,_):is) = let is' = sortBy lowHi (decrementGT a is)
                         in dec (a:as) is'

-- decrement every inversion number by 1, for every tuple in which
-- the element is greater than `a`:
decrementGT a = map (\(a',i) -> (a', if a'>then i-1 else i) )

-- we sort by the inversion number, low to high. For elements with
-- the same inversion number, we say that the greater element tuple
-- should go before a lesser element:
lowHi (a,i) (a',i') 
    | i == i'   = if a>a' then LT else GT
    | otherwise = compare i i'

It’s not the prettiest code I’ve written, but it’s pretty straight-forward and concise. Thanks for looking!

No Comments

An “Adaptive” Move-to-Front Algorithm

13/11/2009

UPDATE: Thanks to steve for pointing out that the scheme I describe here is essentially the same as Algorithm BSTW.

I came up with this variation of the Move-to-Front transform which doesn’t require that there be a known alphabet of symbols. Instead it builds up the alphabet as it goes along and encounters a new symbol. I’m sure that this is a known algorithm, as it’s a fairly obvious variation, but I don’t know what the proper name for it is, so I’m calling it an Adaptive Move-to-Front.

I’m very interested in hearing if this is similar or identical to any other algorithms, so if anyone has any insights, please comment!

Differences from the Standard MTF Transform:

The traditional MTF algorithm, as I understand it, uses some known ordered alphabet of symbols (e.g. the bytes from 0-255, or the letters a-z), so presumably, one need not store this alphabet along with the encoded/compressed data.

With the algorithm I describe here though, the final permutation of the alphabet list, output by the encoder, must be retained to decode the message. The encoded message is identical to the output of the traditional MTF transform, except where a symbol is encoded which has not appeared previously.

Here we have zipped the output of the standard MTF (the fst) with the adaptive version (the snd) to make it easy to compare elements:

Message: “the rrrrain in sssspain falls maaiinly on the plain”

Output (standard MTF, adaptive):

[(116,0),(105,1),(103,2),(35,3),(115,4),(0,0),(0,0),(0,0),(101,5),(107,6),(112,7),(4,4),(2,2),(2,2),(2,2),(116,8),(0,0),(0,0),(0,0),(115,9),(5,5),(5,5),(5,5),(5,5),(109,10),(4,4),(113,11),(0,0),(7,7),(4,4),(114,12),(4,4),(0,0),(7,7),(0,0),(7,7),(6,6),(121,13),(6,6),(116,14),(4,4),(2,2),(14,14),(14,14),(14,14),(3,3),(13,13),(8,8),(10,10),(10,10),(8,8)]

The standard algorithm was working with the ASCII alphabet, so each new symbol is located somewhere far back into the alphabet; it is then moved to the front. In the adaptive version, when we see a new symbol, it is automatically made the next in our running alphabet and then of course moved to the front.

This means that with the adaptive version, for any given stretch from the beginning of the encoded message, we are using the minimal number of different index integers to encode the stream, always choosing the next greatest integer when we require a new one. Here is the output of the adaptive version alone:

Encoded phrase:

[0,1,2,3,4,0,0,0,5,6,7,4,2,2,2,8,0,0,0,9,5,5,5,5,10,4,
11,0,7,4,12,4,0,7,0,7,6,13,6,14,4,2,14,14,14,3,13,8,
10,10,8]

Final permutation of minimal dictionary list:
"nialp ehtoymsfr"

What these means for actual compression schemes on a binary level, I have no idea, but I would like to explore this topic much more. Without further ado, here is some code.

The Encoder:


> import Data.List
> import Control.Arrow

This “adaptive move-to-front” encoder works in the same way as the standard algorithm, except that we start out with an empty alphabet list and build it up as we see new elements not yet in our list.

Essentially if, while searching for an element, we hit the empty list [], we pretend that the element we were looking for was at the end of the list.

This is exactly what happens in the traditional algorithm: when we try to encode a symbol that we haven’t yet encountered, we venture into a new part of the alphabet list that we haven’t touched yet.


> encode :: (Eq b)=> [b] -> ([b], [Int])
> encode = mapAccumL (flip mtf) []  where
    
 We push the current element to the front of the lookup list...

>     mtf x = first (x:. enc 0    where

 ...after returning the index of the element in the list
 along with the new list with the element deleted:

>        enc i []                 = ([],i)  -- < index as if x was last
>        enc i (a:as) | x == a    = (as,i)  -- < delete the x element
>                     | otherwise = first (a:$ enc (i+1) as 

The Decoder

Our decoder requires that we pass it the final permutation of the symbol list, along with the encoded list of indexes. It then simply does the reverse of the encoding function.

To decode we follow these steps:

  1. Return the head of the list of symbols
  2. Re-insert the symbol into the index location specified by the head of the encoded/index list
  3. Move on to the next encoded index, using the new list of symbols. go to (1)


> -- our Adaptive MTF decoder:
> decode :: [b] -> [Int-> [b]
> decode l = snd . mapAccumR dec l where

 return head of alphabet and insert it back into alphabet list at
 the index given by the current element of our encoded index list:

>    dec (a:as) i = (insertAt a i as, a)
    
 takes an element and makes it the new element at the specified index
 in the provided list. Fails if it touches a [].

>    insertAt :: a -> Int -> [a] -> [a]
>    insertAt a' i = ins i
>        where ins 0 as = a':as
>              ins i' (a:as) = a : ins (i' - 1) as

2 Comments

The move-to-front (MTF) Transform

10/11/2009

To follow up my last post about the Burrows-Wheeler Transform, I decided to implement another simple algorithm which is often used after the BWT to help consolidate localized redundancy in the data before entropy encoding.

The idea behind the Move-to-Front algorithm is that we start with some known alphabet (like the list of ASCII characters), and encode our data elements as the index into that alphabet list of each element. The trick is that after each character is encoded, the alphabet list is modified by moving the element whose index we just looked up to the front of the alphabet.

Here are my implementations of encode and decode:


import Data.List


mtf :: (Eq a, Bounded a, Enum a)=> [a] -> Maybe [Int]
mtf = sequence . snd . mapAccumL enc [ minBound.. ]
    where enc l x = (x:delete x l, elemIndex x l)


mtfD :: (Eq a, Bounded a, Enum a)=> [Int-> [a]
mtfD = snd . mapAccumL dec [ minBound.. ]
    where dec l i = let x = l !! i 
                     in (x:delete x l, x)

The mapAccumL function is perfect for passing the state of the dictionary list. Another point of interest to mention is the use of minBound to let the function be polymorphic over any type for which there is a defined order and lowest element.

For example, we can do this:

*Main> mtf “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES” >>= return . mtfDec :: Maybe String
Just “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES”

or we can decode to Word8 bytes, if we want to get back the ASCII character in that form, just by specifying a different return type:

*Main> :m + Data.Word
*Main Data.Word> mtf “SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES” >>= return . mtfDec :: Maybe [Word8]
Just [83,73,88,46,77,73,88,69,68,46,80,73,88,73,69,83, 46,83,73,70,84,46,83,73,88,84,89,46,80,73,88,73,69,46, 68,85,83,84,46,66,79,88,69,83]

Let me know your thoughts!

2 Comments