Cracking a Lock in Haskell with the De Bruijn sequence, pt. 2

29/09/2009

For this post I will rework the Prefer One algorithm from
the previous post
so that it is much more efficient. We will
add words to a Patricia Tree-like dictionary as we see them,
passing the tree along in the State monad; to check if a new
word has been seen we simply check in the tree, rather than
in the array.

First, a little boilerplate…


> module Main
>     where
>
> import Data.Array
> import Data.List(isInfixOf, tails)
>
> import Control.Monad.State
> import Control.Arrow

NEW IMPLEMENTATION:

———————————————————————-

In the previous implementation, to check if the word formed
by adding a one has been seen, we had to iterate through each
of the previous bits in the array, checking words.

For example for words of length 3, finding the 7th bit (?)
by checking if 111 has already been seen:

        /-----\  ==>  111
0 0 0 1 1 1 (1)...
\----/         000 == 111  No
  \----/       001 == 111  No
    \----/     011 == 111  No
      \----/   111 == 111  Yes, so this bit must be (0)

This is extremely inefficient. What we want is to be able to
store all the previous words that we’ve encountered in an easily-
searchable data structure.

In the example above, we would like the three words that we’ve
seen to be stored in what might be called a Trie, so that our
search instead looks like the following:

       /\
      0  1         1 - in tree, go right
     / \  \
    0   1  1       1 - in tree, go right
   / \   \  \
  0   1   1  1     1 - in tree, we've already seen 111,
                       so the last bit must be 0

Our new data structure will look like this:


> data Tree = Bs Tree Tree  -- Bs (zero_bit) (one_bit)
>           | X -- incomplete word
>           | B -- final bit of word
>           deriving Show

We’ll need to build a new tree from a list of bits, appending
a final bit (1, except for the initial tree):


> treeWithFinal1 = mkTree True
> treeWithFinal0 = mkTree False
>
>
> mkTree :: Bit -> [Bit] -> Tree
> mkTree p = foldr mkBranch (if p then Bs X B else Bs B X)  
>     where mkBranch b | b         = Bs X      --1
>                      | otherwise = flip Bs X --0


Finally, here is our new algorithm. The tree is passed in the
State monad, through the use of mapM. The state monad is a little
tricky sometimes:


> preferOneV2 :: Int -> [ Bit ]
> preferOneV2 n = 
>     let upB = 2^n
>         -- the whole bit sequence (one period):
>         bs =  take upB (replicate n False ++ bs')
>         (wp0:wordPrefixes) = [ take (n-1) w | w <- tails bs ]
>        
>         -- pass our Tree around in the State monad
>         state0 = treeWithFinal0 wp0
>        
>         -- thisBit is partially applied, after which we wrap the
>         -- function in a State constructor to make our :: m a
>         bsM'  = mapM (State . thisBit) wordPrefixes
>         (bs',tree) = runState bsM' state0
>
>       -- an infinite stream is returned... because I can:          
>    in cycle bs


With the following function, after we apply it to the word we’re searching
for, it becomes a function :: state -> (val,state), suitable for the
State monad:

Takes a list of the last n-1 Bits (Bools) and traverses a Tree which we’ve
been using to keep track of the words we’ve already seen. We fold the Bit
list into the tree. When we get to the endo of the list, we look for a One.
We return the new bit as well as the new Tree:


> thisBit :: [ Bit ] -> Tree -> (Bit, Tree)

We’re at a Zero bit,


> thisBit (False:bs) (Bs X o) = (True, Bs (treeWithFinal1 bs) o) -- last bit must be 1
> thisBit (False:bs) (Bs z o) = (id *** flip Bs o) (thisBit bs z)

…a One bit,


> thisBit (True:bs) (Bs z X) = (True, Bs z (treeWithFinal1 bs)) 
> thisBit (True:bs) (Bs z o) = (id *** Bs z) (thisBit bs o)

…or else propose a One for the last bit:


> thisBit [] (Bs _ o) = (b , (Bs z B)) 
>     -- we know that if the One bit has been seen (B) then we must
>     -- place a zero. we assume then that the Zero bit is (X):
>     where (b, z) = case o of 
>                         -- this bit = 1, Zero branch = nil
>                         X -> (True,  X) -- 1
>                         -- this bit = 0, Zero branch = last word bit
>                         _ -> (False, B) -- 0


TEST FUNCTIONS:

———————————————————————-

This code is copied from the previous post:

We use Bool for bits, where False ==> 0, True ==> 1:


> type Bit = Bool

Our garage-door lock model for testing the function:


> type Combo    = [ Bit ]
> type Receiver = Combo -> Bool

True means access granted:


> programReceiver :: Combo -> Receiver
> programReceiver = isInfixOf 

Test out our function:


> main =  let secretCode = [True,True,False,False,True,
>                           False,True,False,True,True]
>             receiver = programReceiver secretCode
>             crackingStream = preferOneV2 10
>
>          in if receiver crackingStream
>             then print "WE'RE IN!"
>             else print "...bugs"
>

Stay tuned for one more post on these algorithms.

2 Comments

List Grouping module released

14/08/2009

EDIT: Don’t use this package, but use instead Data.List.Split by Brent Yorgey. I didn’t see that a package like his existed! This module will hopefully be removed from hackage if they can do that.

I just finished the initial release of a simple module called list-grouping that contains functions to partition a list into sub-lists in various ways, based on some predicate or integer offset. Functions like these are a little awkward to write and I was surprised when I didn’t see anything on hackage!

Check out the package description and install it with:

$ cabal install list-grouping

Here is an example from a previous post to build a binary tree from an in-order list, which uses the above library:


module Main
    where

import Data.List.Grouping

data Tree a = Node a (Tree a) (Tree a)
            | End
             deriving Show

fromSorted :: [a] -> Tree a
fromSorted = foldl mkTree End . splitWith [2^| n<-[0..]]
    where mkTree l (n:ns) = Node n l (fromSorted ns)

I’m sure the functions can be made more efficient, to take advantage of fusion or what-not, and I hope the library will eventually contain the most efficient implementations possible.

I also am looking for suggestions for other useful list grouping functions to include. Send your suggestions along! You can get the darcs source with:

$ darcs get http://coder.bsimmons.name/code/ListGrouping/

2 Comments

Huffman Coding

31/05/2009

This is the second of probably three posts (the first was on run-length encoding in Haskell) inspired by Thomas Guest’s interesting article on the Deflate algorithm. This is also my first post in literate haskell. Please post any improvements to the code if you have them!

For a refreshingly readable introduction to Huffman Coding and the Deflate algorithm, have a look at this short explanation by Antaeus Feldspar.


First some boilerplate...

> module Huffman
>     where
>
> import Data.List (sort, insert)
> import qualified Data.Map as M
> import Data.Function (on)
> import Control.Arrow (second)

We make an abstract datatype for binary digits. I wonder if this would
be a nice (but slow?) way of working with binary IO. There doesn't seem
to be a package on Hackage for this

> data Bit = O | I
>            deriving (ShowOrdEq)

we define a simple binary tree useful for decoding the encoded binary
stream. the simple algorithm for assigning codes to our symbols
produces this tree. A completed tree for some text containing only
three letter might look like this:

     0    /\    1
         /\ A  
        C  B

> data HTree a = Branch {zer :: (HTree a), one :: (HTree a),
>                        wt  :: Int }
>              | Leaf {symb :: a,  wt :: Int }
>               deriving (Show)

maps from <Symbol> to <Binary Code>, we create this from the HTree
built from the list of weighted symbols. the HTree can't be used for
encoding.

> type CodeDict a = M.Map a [Bit]

Now some instance declarations so that we know how to order (by weight):

> instance Ord (HTree a) where
>     compare = compare `on` wt
>
> instance Eq (HTree a) where
>     (=== (==`on` wt


our function for merging two branches which we use to build the HTree
from the list of symbols and their corresponding weights. no point in
defining a Monoid instance, but we'll tip our hat to it:

> mappend t1 t2 = Branch t1 t2 (wt t1 + wt t2)


BULDING THE CODING TREES


    
We assign codes to our weighted symbols using a simple algorithm
which takes the trees (initially Leaves) with the lowest weights
and combines them (and their weights) and inserts them back into
the list until they have been combined into a single tree:

> buildDecTree :: [(a,Int)] -> HTree a
> buildDecTree = build . sort . map (uncurry Leaf)
>     where build (t:[])     = t
>           build (t1:t2:ts) = build $ insert (t1 `mappend` t2) ts


now convert the binary tree representation to a dictionary form for
encoding. we decompose the tree into a list from the top down, mapping
either a 1 or 0 over the flattened children. a little confusing:

> buildEncDict :: (Ord a) => HTree a -> CodeDict a
> buildEncDict = M.fromList . build 
>     where build (Leaf t _)     = (t,[]) : []
>           build (Branch a b _) = mapBit O a ++ mapBit I b
>            -- build up the codes in the snd of each Leaf's tuple:
>           mapBit b = map (second (b:)) . build


(EN/DE)CODING FUNCTIONS



To encode a list of symbols for which we've generated an HTree, we just
map over it, looking up it's code in our Map dictionary:

> encode :: (Ord a) => CodeDict a -> [a] -> [Bit]
> encode d = concatMap (d M.!)


To decode we simply read a Bit at a time from the input, at the same time
traversing the HTree (going left when we encounter a zero, and vice versa).
When we hit a Leaf (the end of a code) we return the symbol and go onto
the next bit from the top of the HTree once again.

> decode :: HTree a -> [Bit] -> [a]
> decode t [] = []
> decode t bs = dec bs t
>     where dec bs' (Leaf x _) = x : decode t bs'
>           dec (O:bs') (Branch l _ _) = dec bs' l 
>           dec (I:bs') (Branch _ r _) = dec bs' r 



UPDATE: I scrapped and re-did this section. Should be a little better now.:


USAGE EXAMPLES



I realize I need a more compelling example and some explanation. First,
imagine we want to encode into binary the following phrase:

> phrase = "twenty bytes of text"

we could represent it in ASCII but that would be wasteful of space, using
a whole byte per character when we are using only ten of the 256 possible
codes in the ASCII character set.

So we generate a list of the characters to encode along with their
frequencies (symbols with higher frequencies will be given shorter
prefix codes, saving space!):

> huffmanTree' = buildDecTree [('t',5),('e',3),('y',2),('w',1),('n',1),
>                              ('b',1),('s',1),('o',1),('f',1),('x',1),
>                              (' ',3)]

...and encode it using the dictionary built from the code tree we just
built:

> encodedPhrase = let dict = buildEncDict huffmanTree'
>                  in encode dict phrase

This yields the following binary stream of 8 bits (vs. 20 if we had used
ASCII encoding):

[I,O,O,O,I,O,I,I, O,O,O,I,I,I,O,O, I,O,I,I,I,O,O,O, O,O,I,O,I,O,I,I,
 O,O,O,O,I,I,I,I, O,I,I,I,O,O,I,I, I,I,I,I,I,I,O,I, I,O,O,I,I,O,I,O]

Of course we have to encode instructions to build our tree along with the
above message.

7 Comments