Cracking a Lock in Haskell with the De Bruijn sequence, pt. 1

24/09/2009

Update: a more efficient variation is implemented in Part 2.

A De Bruijn sequence is (for example) a cyclical list of characters such that every word of a given length appears once and only once in the sequence. Instead of using letters, you can have a De Bruijn sequence of bits. For example here is one possibility for a sequence in which every 3-bit word appears somewhere (you have to wrap around from the end for the final “words”):

00011101

In a sense we compress a dictionary of 2^3 = 8 3-bit words (or 24 bits) to a sequence of only 2^3 = 8 bits. There are some very simple algorithms for generating these kinds of sequences of bits, and I will implement two here: the “prefer one” algorithm and a subtle variation called “prefer opposite[PDF] by Alhakim“. Here is the author’s excellent description of the traditional Prefer One algorithm from the linked paper:

” The prefer-one algorithm is a very simple method amazingly capable of generating a full cycle. For any positive integer n ≥ 1, the algorithm puts n zeroes, and proceeds after this by proposing 1 for the next bit and accepting it when the word formed by the last n bits has not been encountered previously in the sequence, otherwise 0 is placed. The algorithm stops when both 0 and 1 do not bring a new word.”

And here is my haskell implementation. It works by creating a lazy array which we build from a list being generated by searching the earlier portions of the array that already have been defined. It’s very similar to the method I used in modeling the LZ77 algorithm.:


module Main
    where

import Data.Array
import Data.List(isInfixOf)


type Bit = Bool

preferOne :: Int -> [ Bit ]
preferOne n = 
    let upB = 2^n
        arr =  listArray (1, upB) (replicate n False ++ rest)
        
        -- since we use Bool for bits, we can say (not.alreadySeen):
        rest  =  map (not . alreadySeen) [n+1 .. upB]

        alreadySeen i = or $
              do let range = [ arr!i' | i' <- [i-n+1 .. i-1]] ++ [True
            
                 i1 <- [1..i-n] 
                 let rangeP = [ arr!i' | i' <- [i1 .. i1+n-1] ]
            
                 return (range == rangeP)

       -- an infinite stream is returned... because I can:          
    in cycle (elems arr)

The Prefer Opposite algorithm works on the same principle, but with a couple twists. In my words it is as follows:

Start with n zeros. Search for successive bits as follows:

propose to place the bit that is different from the previous bit: if the word formed by this bit has not been seen already, then choose it, else choose the same bit as the last. When 2^3-1 bit have been written, write a 1 for the final bit. and you’re done. (you can also keep track of how long a string of ones has been written; the sequence will always end with n ones)

Here is my implementation:


preferOpposite :: Int -> [ Bit ]
preferOpposite n = 
    let upB = 2^n
        arr =  array (1, upB) (final : inits ++ rest)
        
        -- we must specify the very last element of the sequence
        -- which will always be a One:
        final = (upB,True)
        inits = [ (i,False| i<-[1..n] ]
        rest  =  map seqBit [n+1 .. upB-1]
        
        -- if the word generated by making the current bit the opposite
        -- of the previous has already been seen, we make the bit the
        -- same as the previous:
        seqBit i =  (i, nextB)
            where bP = arr!(i-1)  --previous bit
                  nextB | or (alreadySeen i bP) = bP --same as previous
                        | otherwise         = not bP --choose opposite

        --checks if the n-length string we would form by choosing the
        --opposite for the next bit is already present in the array:
        alreadySeen i bP = do
             let range = [ arr!i' | i' <- [i-n+1 .. i-1]] ++ [not bP] 
            
             i1 <- [1..i-n] 
             let rangeP = [ arr!i' | i' <- [i1 .. i1+n-1] ]
            
             return (range == rangeP)
                
    in cycle (elems arr)

Enough code for now, let’s talk about applications of the sequence. The most easily-approachable and interesting application to me applies to those keyed entry systems, such as electronic door locks, which accept a stream of keys (i.e. they unlock as soon as the correct key sequence is input, without any need to press an “enter” key). These mechanisms are very susceptible to attack with a De Bruijn sequence of key presses.

Since the above algorithms use a binary alphabet, as opposed to say a decimal one found on electronic keypads (check out Jonas Elfström’s blog post dealing with keypads with worn out letters for more on that), I will choose as my target an old-fashined garage door opener.

An old-style garage door opener remote has 8 binary DIP switches allowing 256 different code combinations. We will imagine the garage door is susceptible to a stream-based attack like the electronic lock we described.

We can model the garage door receiver as follows:


type Combo = [ Bit ]
type Receiver = Combo -> Bool

-- True means access granted
programReceiver :: Combo -> Receiver
programReceiver = isInfixOf 
          

Now let’s test out our model with this main function:


main =  let secretCode = [False,False,True,False,True,False,True,True]
            receiver = programReceiver secretCode
            crackingStream = preferOne 8

         in if receiver crackingStream
            then print "WE'RE IN!"
            else print "...bugs"

Looks good!:

*Main> :main
“WE’RE IN!”

In the next couple posts I’ll explore improvements to the performance of these algorithms and maybe a few other things.

2 Comments

Fun with Lazy Arrays: the LZ77 Algorithm

18/06/2009

This is my third post investigating compression techniques related to the DEFLATE algorithm: the first on run-length encoding, and the second on simple Huffman Coding. This post models the LZ77 algorithm, the second of the two compression strategies used by DEFLATE, and in the process explores some interesting properties of Haskell’s basic Arrays.

IMPLEMENTATION:


> module LZ77
>     where

we will use GHC's "basic non-strict arrays" for this
experiment:

> import Data.Array


and use Ints to store the length of the entire decoded
message (needed to create our array):

> type Length = Int
> type Offset = Int

in place of the length in the standard length-offset pair
I've decided to use an Int representing the index of the
last element in the span relative to the element at
offset. Thus the pair encoding a span of only one element,
two elements back would be (0,2), rather than the more
traditional (1,2).

This simply makes more sense to me, especially since I want
to be able to encoded reversed sequences, as you will see.

> type Index = Int

and a few simple dataypes for our uncompressed and
compressed data respectively:

> type Decoded a = Array Index a
>
> data Encoded a = 
>     Enc Length [ Either a (Index,Offset) ] 


The decompress function works by traversing the encoded message,
keeping track of our array index position (since offsets are
relative to the current position), and building an Array lazily
from a list which we generate, in part by referencing elements
from the partially generated array itself.

So when we see a Right value we look up in the Array the elements
referenced by the length-offset, concat-ing that list with the
result of processing the rest of the encoded message.

If we hit [] in 'dec' we call an error because the stored value
for the length of the uncompressed message in the Encoded type
was longer than what the 'decompress' function could produce.

It's diffficult to describe, but I hope the code is clear:

> decompress :: Encoded a -> Decoded a
> decompress (Enc el es) = decoded
>     where decoded = listArray (0,el - 1$ dec 0 es
>          
>           dec  _     [] = 
>               error "message is shorter than it should be" 
>
>           dec n (Left x : xs) = 
>               x : dec (n+1) xs
>          
>           dec n (Right (iRel,off) : xs) =
>               let i1 = n  - off
>                   i2 = (if iN > i1 then succ else pred) i1 
>                   iN = i1 + iRel
>                
>                in [ decoded!| i <- [i1, i2 .. iN] ] ++ 
>                   dec (n + 1 + abs iRel) xs


Some interesting things about the code above:

    1) we create an array from a list, which we build,
       in part, by looking up elements from the array
       we are in the process of building.

    2) we can compress a sequence of symbols which were
       seen previously but in reverse order, simply by
       storing a negative relative index in the
       (relative_index,offset) tuple. So, the string...
        
            "her racecar returns to race"
      
       might compress to:
            
            {her race(-5,2)turns to(4,19)}

       I'm not sure if this is useful in real compression,
       especially when it comes down to the binary encoding.

    3) even more interesting, we can use this same decoder
       function to decompress data that matches a sequence
       in parts of the array we haven't built yet! We simply
       use a negative offset in our tuple.

EXAMPLES AND CONCLUSION:


point (3) may or may not be something like the LZ78 algorithm,
which apparently works by encoding future data, but it is
defintely a cool thing to be able to do with arrays:

> coolArray = listArray (0,4
>        [0,  coolArray!4 - 3,  2,  coolArray!1 + 2,  4]


Here's an example with great compression where the relative
index exceeds the offset:

> exceedsOffset = elems $ decompress $
>       Enc 25 [Left 'B'Left 'l'Left 'a'Left 'h'Left ' ',
>               Left 'b'Right (17,5), Left '!']


...and a code example for (2) above:

> reverseReference = elems $ decompress $
>       Enc 27 [Left 'h',Left 'e',Left 'r',Left ' ',Left 'r',
>               Left 'a',Left 'c',Left 'e',Right(-5,2),Left 't',
>               Left 'u',Left 'r',Left 'n',Left 's',Left ' ',
>               Left 't',Left 'o',Right(4,19)]

...and finally an example combining (2) and (3):

> reverseLookAhead = elems $ decompress $ 
>       Enc 5 [Left 1Right (-1,-3), Left 3Left 4]


I was surprised to discover these properties in Haskell's lazy
Arrays. hope they came as a surprise to a few others.

2 Comments

Huffman Coding

31/05/2009

This is the second of probably three posts (the first was on run-length encoding in Haskell) inspired by Thomas Guest’s interesting article on the Deflate algorithm. This is also my first post in literate haskell. Please post any improvements to the code if you have them!

For a refreshingly readable introduction to Huffman Coding and the Deflate algorithm, have a look at this short explanation by Antaeus Feldspar.


First some boilerplate...

> module Huffman
>     where
>
> import Data.List (sort, insert)
> import qualified Data.Map as M
> import Data.Function (on)
> import Control.Arrow (second)

We make an abstract datatype for binary digits. I wonder if this would
be a nice (but slow?) way of working with binary IO. There doesn't seem
to be a package on Hackage for this

> data Bit = O | I
>            deriving (ShowOrdEq)

we define a simple binary tree useful for decoding the encoded binary
stream. the simple algorithm for assigning codes to our symbols
produces this tree. A completed tree for some text containing only
three letter might look like this:

     0    /\    1
         /\ A  
        C  B

> data HTree a = Branch {zer :: (HTree a), one :: (HTree a),
>                        wt  :: Int }
>              | Leaf {symb :: a,  wt :: Int }
>               deriving (Show)

maps from <Symbol> to <Binary Code>, we create this from the HTree
built from the list of weighted symbols. the HTree can't be used for
encoding.

> type CodeDict a = M.Map a [Bit]

Now some instance declarations so that we know how to order (by weight):

> instance Ord (HTree a) where
>     compare = compare `on` wt
>
> instance Eq (HTree a) where
>     (=== (==`on` wt


our function for merging two branches which we use to build the HTree
from the list of symbols and their corresponding weights. no point in
defining a Monoid instance, but we'll tip our hat to it:

> mappend t1 t2 = Branch t1 t2 (wt t1 + wt t2)


BULDING THE CODING TREES


    
We assign codes to our weighted symbols using a simple algorithm
which takes the trees (initially Leaves) with the lowest weights
and combines them (and their weights) and inserts them back into
the list until they have been combined into a single tree:

> buildDecTree :: [(a,Int)] -> HTree a
> buildDecTree = build . sort . map (uncurry Leaf)
>     where build (t:[])     = t
>           build (t1:t2:ts) = build $ insert (t1 `mappend` t2) ts


now convert the binary tree representation to a dictionary form for
encoding. we decompose the tree into a list from the top down, mapping
either a 1 or 0 over the flattened children. a little confusing:

> buildEncDict :: (Ord a) => HTree a -> CodeDict a
> buildEncDict = M.fromList . build 
>     where build (Leaf t _)     = (t,[]) : []
>           build (Branch a b _) = mapBit O a ++ mapBit I b
>            -- build up the codes in the snd of each Leaf's tuple:
>           mapBit b = map (second (b:)) . build


(EN/DE)CODING FUNCTIONS



To encode a list of symbols for which we've generated an HTree, we just
map over it, looking up it's code in our Map dictionary:

> encode :: (Ord a) => CodeDict a -> [a] -> [Bit]
> encode d = concatMap (d M.!)


To decode we simply read a Bit at a time from the input, at the same time
traversing the HTree (going left when we encounter a zero, and vice versa).
When we hit a Leaf (the end of a code) we return the symbol and go onto
the next bit from the top of the HTree once again.

> decode :: HTree a -> [Bit] -> [a]
> decode t [] = []
> decode t bs = dec bs t
>     where dec bs' (Leaf x _) = x : decode t bs'
>           dec (O:bs') (Branch l _ _) = dec bs' l 
>           dec (I:bs') (Branch _ r _) = dec bs' r 



UPDATE: I scrapped and re-did this section. Should be a little better now.:


USAGE EXAMPLES



I realize I need a more compelling example and some explanation. First,
imagine we want to encode into binary the following phrase:

> phrase = "twenty bytes of text"

we could represent it in ASCII but that would be wasteful of space, using
a whole byte per character when we are using only ten of the 256 possible
codes in the ASCII character set.

So we generate a list of the characters to encode along with their
frequencies (symbols with higher frequencies will be given shorter
prefix codes, saving space!):

> huffmanTree' = buildDecTree [('t',5),('e',3),('y',2),('w',1),('n',1),
>                              ('b',1),('s',1),('o',1),('f',1),('x',1),
>                              (' ',3)]

...and encode it using the dictionary built from the code tree we just
built:

> encodedPhrase = let dict = buildEncDict huffmanTree'
>                  in encode dict phrase

This yields the following binary stream of 8 bits (vs. 20 if we had used
ASCII encoding):

[I,O,O,O,I,O,I,I, O,O,O,I,I,I,O,O, I,O,I,I,I,O,O,O, O,O,I,O,I,O,I,I,
 O,O,O,O,I,I,I,I, O,I,I,I,O,O,I,I, I,I,I,I,I,I,O,I, I,O,O,I,I,O,I,O]

Of course we have to encode instructions to build our tree along with the
above message.

7 Comments