<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>LAMBDAPHONE &#187; compression</title>
	<atom:link href="http://coder.bsimmons.name/blog/tag/compression/feed/" rel="self" type="application/rss+xml" />
	<link>http://coder.bsimmons.name/blog</link>
	<description>fragmentary ideas  ䷿  intellectual what-nots  ䷷  and haskell programming  ䷴</description>
	<lastBuildDate>Tue, 27 Jul 2010 16:58:03 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>An &#8220;Adaptive&#8221; Move-to-Front Algorithm</title>
		<link>http://coder.bsimmons.name/blog/2009/11/an-adaptive-move-to-front-algorithm/</link>
		<comments>http://coder.bsimmons.name/blog/2009/11/an-adaptive-move-to-front-algorithm/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 05:28:34 +0000</pubDate>
		<dc:creator>jberryman</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[combinator]]></category>
		<category><![CDATA[compression]]></category>

		<guid isPermaLink="false">http://coder.bsimmons.name/blog/?p=273</guid>
		<description><![CDATA[<p><strong>UPDATE:</strong> <em>Thanks to <a href="http://smj.posterous.com/">steve</a> for pointing out that the scheme I describe here is essentially the same as <a href="http://www.ics.uci.edu/~dan/pubs/DC-Sec5.html#Sec_5.2">Algorithm BSTW</a>.</em></p>
<p>I came up with this variation of the <a href="http://coder.bsimmons.name/blog/2009/11/the-move-to-front-mtf-transform/">Move-to-Front transform</a> which doesn&#8217;t require that there be a&#8230; <a href="http://coder.bsimmons.name/blog/2009/11/an-adaptive-move-to-front-algorithm/" class="read_more">   [ R E A D &#124; M O R E ]</a></p>]]></description>
			<content:encoded><![CDATA[<p><strong>UPDATE:</strong> <em>Thanks to <a href="http://smj.posterous.com/">steve</a> for pointing out that the scheme I describe here is essentially the same as <a href="http://www.ics.uci.edu/~dan/pubs/DC-Sec5.html#Sec_5.2">Algorithm BSTW</a>.</em></p>
<p>I came up with this variation of the <a href="http://coder.bsimmons.name/blog/2009/11/the-move-to-front-mtf-transform/">Move-to-Front transform</a> which doesn&#8217;t require that there be a known alphabet of symbols. Instead it builds up the alphabet as it goes along and encounters a new symbol. I&#8217;m sure that this is a known algorithm, as it&#8217;s a fairly obvious variation, but I don&#8217;t know what the proper name for it is, so I&#8217;m calling it an <a href="http://www.cs.sfu.ca/CC/365/li/squeeze/AdaptiveHuff.html">Adaptive</a> Move-to-Front. </p>
<p>I&#8217;m very interested in hearing if this is similar or identical to any other algorithms, so if anyone has any insights, please comment!</p>
<h3>Differences from the Standard MTF Transform:</h3>
<p>The <a href="http://compgt.googlepages.com/mtf">traditional MTF algorithm</a>, as I understand it, uses some known ordered alphabet of symbols (e.g. the bytes from 0-255, or the letters a-z), so presumably, one need not store this alphabet along with the encoded/compressed data. </p>
<p>With the algorithm I describe here though, the final permutation of the alphabet list, output by the encoder, must be retained to decode the message. The encoded message is identical to the output of the traditional MTF transform, <em>except</em> where a symbol is encoded which <em>has not appeared previously</em>.</p>
<p>Here we have zipped the output of the standard MTF (the <code>fst</code>) with the adaptive version (the <code>snd</code>) to make it easy to compare elements:</p>
<blockquote><p><strong>Message:</strong> <em>&#8220;the rrrrain in sssspain falls maaiinly on the plain&#8221;</em></p>
<p><strong>Output</strong> (standard MTF, adaptive):</p>
<p><code>[(116,0),(105,1),(103,2),(35,3),(115,4),(0,0),(0,0),(0,0),(101,5),(107,6),(112,7),(4,4),(2,2),(2,2),(2,2),(116,8),(0,0),(0,0),(0,0),(115,9),(5,5),(5,5),(5,5),(5,5),(109,10),(4,4),(113,11),(0,0),(7,7),(4,4),(114,12),(4,4),(0,0),(7,7),(0,0),(7,7),(6,6),(121,13),(6,6),(116,14),(4,4),(2,2),(14,14),(14,14),(14,14),(3,3),(13,13),(8,8),(10,10),(10,10),(8,8)]</code></p>
</blockquote>
<p>The standard algorithm was working with the ASCII alphabet, so each new symbol is located somewhere far back into the alphabet; it is then moved to the front. In the adaptive version, when we see a new symbol, it is automatically made the next in our running alphabet and then of course moved to the front. </p>
<p>This means that with the adaptive version, for any given stretch from the beginning of the encoded message, we are using the minimal number of different index integers to encode the stream, always choosing the next greatest integer when we require a new one. Here is the output of the adaptive version alone:</p>
<blockquote>
<p><strong>Encoded phrase:</strong><br />
<code><br />
[0,1,2,3,4,0,0,0,5,6,7,4,2,2,2,8,0,0,0,9,5,5,5,5,10,4,<br />
11,0,7,4,12,4,0,7,0,7,6,13,6,14,4,2,14,14,14,3,13,8,<br />
10,10,8]</code></p>
<p><strong>Final permutation of minimal dictionary list:</strong><br />
<code>"nialp ehtoymsfr"</code>
</p></blockquote>
<p>What these means for actual compression schemes on a binary level, I have no idea, but I would like to explore this topic much more. Without further ado, here is some code.</p>
<h3>The Encoder:</h3>
<p><div class="vimblock"><br>
<span class="Comment">&gt;</span>&nbsp;<span class="PreProc">import</span>&nbsp;Data.List<br>
<span class="Comment">&gt;</span>&nbsp;<span class="PreProc">import</span>&nbsp;Control.Arrow<br>
<br></div></p>
<p>This &#8220;adaptive move-to-front&#8221; encoder works in the same way as the standard algorithm, except that we start out with an empty alphabet list and build it up as we see new elements not yet in our list.</p>
<p>Essentially if, while searching for an element, we hit the empty list [], we pretend that the element we were looking for was at the end of the list.</p>
<p>This is exactly what happens in the traditional algorithm: when we try to encode a symbol that we haven&#8217;t yet encountered, we venture into a new part of the alphabet list that we haven&#8217;t touched yet.</p>
<p><div class="vimblock"><br>
<span class="Comment">&gt;</span>&nbsp;encode&nbsp;<span class="Statement">::</span>&nbsp;(<span class="Type">Eq</span>&nbsp;b)<span class="Statement">=&gt;</span>&nbsp;[b]&nbsp;<span class="Statement">-&gt;</span>&nbsp;([b],&nbsp;[<span class="Type">Int</span>])<br>
<span class="Comment">&gt;</span>&nbsp;encode&nbsp;<span class="Statement">=</span>&nbsp;mapAccumL&nbsp;(<span class="Identifier">flip</span>&nbsp;mtf)&nbsp;[]&nbsp;&nbsp;<span class="Type">where</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;<br>
&nbsp;We push the current element to the front of the lookup list...<br>
<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp; mtf&nbsp;x&nbsp;<span class="Statement">=</span>&nbsp;first&nbsp;(x<span class="Statement">:</span>)&nbsp;<span class="Statement">.</span>&nbsp;enc&nbsp;<span class="Constant">0</span>&nbsp;&nbsp;&nbsp;&nbsp;<span class="Type">where</span><br>
<br>
&nbsp;...after returning the index of the element in the list<br>
&nbsp;along with the new list with the element deleted:<br>
<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;enc&nbsp;i&nbsp;[]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Statement">=</span>&nbsp;([],i)&nbsp;&nbsp;<span class="Comment">-- &lt; index as if x was last</span><br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;enc&nbsp;i&nbsp;(a<span class="Statement">:</span>as)&nbsp;<span class="Statement">|</span>&nbsp;x&nbsp;<span class="Statement">==</span>&nbsp;a&nbsp;&nbsp;&nbsp;&nbsp;<span class="Statement">=</span>&nbsp;(as,i)&nbsp;&nbsp;<span class="Comment">-- &lt; delete the x element</span><br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Statement">|</span>&nbsp;<span class="Identifier">otherwise</span>&nbsp;<span class="Statement">=</span>&nbsp;first&nbsp;(a<span class="Statement">:</span>)&nbsp;<span class="Statement">$</span>&nbsp;enc&nbsp;(i<span class="Statement">+</span><span class="Constant">1</span>)&nbsp;as&nbsp;<br>
<br></div></p>
<h3>The Decoder</h3>
<p>Our decoder requires that we pass it the final permutation of the symbol list, along with the encoded list of indexes. It then simply does the reverse of the encoding function.</p>
<p>To decode we follow these steps:</p>
<ol>
<li>    Return the head of the list of symbols</li>
<li>     Re-insert the symbol into the index location specified by the head of the encoded/index list</li>
<li>Move on to the next encoded index, using the new list of symbols. go to (1)</li>
</ol>
<p><div class="vimblock"><br>
<span class="Comment">&gt;</span>&nbsp;<span class="Comment">-- our Adaptive MTF decoder:</span><br>
<span class="Comment">&gt;</span>&nbsp;decode&nbsp;<span class="Statement">::</span>&nbsp;[b]&nbsp;<span class="Statement">-&gt;</span>&nbsp;[<span class="Type">Int</span>]&nbsp;<span class="Statement">-&gt;</span>&nbsp;[b]<br>
<span class="Comment">&gt;</span>&nbsp;decode&nbsp;l&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">snd</span>&nbsp;<span class="Statement">.</span>&nbsp;mapAccumR&nbsp;dec&nbsp;l&nbsp;<span class="Type">where</span><br>
<br>
&nbsp;return head of alphabet and insert it back into alphabet list at <br>
&nbsp;the index given by the current element of our encoded index list:<br>
<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;dec&nbsp;(a<span class="Statement">:</span>as)&nbsp;i&nbsp;<span class="Statement">=</span>&nbsp;(insertAt&nbsp;a&nbsp;i&nbsp;as,&nbsp;a)<br>
&nbsp;&nbsp;&nbsp;&nbsp;<br>
&nbsp;takes an element and makes it the new element at the specified index<br>
&nbsp;in the provided list. Fails if it touches a [].<br>
<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;insertAt&nbsp;<span class="Statement">::</span>&nbsp;a&nbsp;<span class="Statement">-&gt;</span>&nbsp;<span class="Type">Int</span>&nbsp;<span class="Statement">-&gt;</span>&nbsp;[a]&nbsp;<span class="Statement">-&gt;</span>&nbsp;[a]<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;insertAt&nbsp;a' i&nbsp;<span class="Statement">=</span>&nbsp;ins&nbsp;i<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Type">where</span>&nbsp;ins&nbsp;<span class="Constant">0</span>&nbsp;as&nbsp;<span class="Statement">=</span>&nbsp;a'<span class="Statement">:</span>as<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ins&nbsp;i' (a<span class="Statement">:</span>as)&nbsp;<span class="Statement">=</span>&nbsp;a&nbsp;<span class="Statement">:</span>&nbsp;ins&nbsp;(i' <span class="Statement">-</span>&nbsp;<span class="Constant">1</span>)&nbsp;as<br>
<br></div></p>
]]></content:encoded>
			<wfw:commentRss>http://coder.bsimmons.name/blog/2009/11/an-adaptive-move-to-front-algorithm/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The move-to-front (MTF) Transform</title>
		<link>http://coder.bsimmons.name/blog/2009/11/the-move-to-front-mtf-transform/</link>
		<comments>http://coder.bsimmons.name/blog/2009/11/the-move-to-front-mtf-transform/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 13:06:35 +0000</pubDate>
		<dc:creator>jberryman</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[short]]></category>

		<guid isPermaLink="false">http://coder.bsimmons.name/blog/?p=265</guid>
		<description><![CDATA[<p>To follow up my last <a href="http://coder.bsimmons.name/blog/2009/11/polishing-a-functional-pearl-the-burrows-wheeler-transform/">post about the Burrows-Wheeler Transform</a>, I decided to implement another simple algorithm which is often used after the BWT to help consolidate localized redundancy in the data before entropy encoding.</p>
<p>The idea behind the&#8230; <a href="http://coder.bsimmons.name/blog/2009/11/the-move-to-front-mtf-transform/" class="read_more">   [ R E A D &#124; M O R E ]</a></p>]]></description>
			<content:encoded><![CDATA[<p>To follow up my last <a href="http://coder.bsimmons.name/blog/2009/11/polishing-a-functional-pearl-the-burrows-wheeler-transform/">post about the Burrows-Wheeler Transform</a>, I decided to implement another simple algorithm which is often used after the BWT to help consolidate localized redundancy in the data before entropy encoding.</p>
<p>The idea behind the <a href="http://www.data-compression.info/Algorithms/MTF/index.htm">Move-to-Front algorithm</a> is that we start with some known alphabet (like the list of ASCII characters), and encode our data elements as the <em>index into that alphabet list</em> of each element. The trick is that after each character is encoded, the alphabet list is modified by <strong>moving</strong> the element whose index we just looked up <strong>to the front</strong> of the alphabet.</p>
<p>Here are my implementations of encode and decode:</p>
<p><blockquote class="vimblock"><br>
<span class="PreProc">import</span>&nbsp;Data.List<br>
<br>
<br>
mtf&nbsp;<span class="Statement">::</span>&nbsp;(<span class="Type">Eq</span>&nbsp;a,&nbsp;<span class="Type">Bounded</span>&nbsp;a,&nbsp;<span class="Type">Enum</span>&nbsp;a)<span class="Statement">=&gt;</span>&nbsp;[a]&nbsp;<span class="Statement">-&gt;</span>&nbsp;<span class="Type">Maybe</span>&nbsp;[<span class="Type">Int</span>]<br>
mtf&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">sequence</span>&nbsp;<span class="Statement">.</span>&nbsp;<span class="Identifier">snd</span>&nbsp;<span class="Statement">.</span>&nbsp;mapAccumL&nbsp;enc&nbsp;[&nbsp;<span class="Identifier">minBound</span><span class="Statement">..</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="Type">where</span>&nbsp;enc&nbsp;l&nbsp;x&nbsp;<span class="Statement">=</span>&nbsp;(x<span class="Statement">:</span>delete&nbsp;x&nbsp;l,&nbsp;elemIndex&nbsp;x&nbsp;l)<br>
<br>
<br>
mtfD&nbsp;<span class="Statement">::</span>&nbsp;(<span class="Type">Eq</span>&nbsp;a,&nbsp;<span class="Type">Bounded</span>&nbsp;a,&nbsp;<span class="Type">Enum</span>&nbsp;a)<span class="Statement">=&gt;</span>&nbsp;[<span class="Type">Int</span>]&nbsp;<span class="Statement">-&gt;</span>&nbsp;[a]<br>
mtfD&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">snd</span>&nbsp;<span class="Statement">.</span>&nbsp;mapAccumL&nbsp;dec&nbsp;[&nbsp;<span class="Identifier">minBound</span><span class="Statement">..</span>&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="Type">where</span>&nbsp;dec&nbsp;l&nbsp;i&nbsp;<span class="Statement">=</span>&nbsp;<span class="Statement">let</span>&nbsp;x&nbsp;<span class="Statement">=</span>&nbsp;l&nbsp;<span class="Statement">!!</span>&nbsp;i&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Statement">in</span>&nbsp;(x<span class="Statement">:</span>delete&nbsp;x&nbsp;l,&nbsp;x)<br>
<br></blockquote></p>
<p>The <code>mapAccumL</code> function is perfect for passing the state of the dictionary list. Another point of interest to mention is the use of <code>minBound</code> to let the function be polymorphic over any type for which there is a defined order and lowest element.</p>
<p>For example, we can do this:</p>
<blockquote><p><strong>*Main></strong> mtf &#8220;SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES&#8221;  >>= return . mtfDec :: Maybe String<br />
<em>Just &#8220;SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES&#8221;</em></p></blockquote>
<p>or we can decode to Word8 bytes, if we want to get back the ASCII character in that form, just by specifying a different return type:</p>
<blockquote><p><strong>*Main></strong> :m + Data.Word<br />
<strong>*Main Data.Word></strong> mtf &#8220;SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES&#8221;  >>= return . mtfDec :: Maybe [Word8]<br />
<em>Just [83,73,88,46,77,73,88,69,68,46,80,73,88,73,69,83, 46,83,73,70,84,46,83,73,88,84,89,46,80,73,88,73,69,46, 68,85,83,84,46,66,79,88,69,83]</em>
</p></blockquote>
<p>Let me know your thoughts!</p>
]]></content:encoded>
			<wfw:commentRss>http://coder.bsimmons.name/blog/2009/11/the-move-to-front-mtf-transform/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Polishing a Functional Pearl: The Burrows-Wheeler Transform</title>
		<link>http://coder.bsimmons.name/blog/2009/11/polishing-a-functional-pearl-the-burrows-wheeler-transform/</link>
		<comments>http://coder.bsimmons.name/blog/2009/11/polishing-a-functional-pearl-the-burrows-wheeler-transform/#comments</comments>
		<pubDate>Sat, 07 Nov 2009 22:22:52 +0000</pubDate>
		<dc:creator>jberryman</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[short]]></category>
		<category><![CDATA[sorting]]></category>

		<guid isPermaLink="false">http://coder.bsimmons.name/blog/?p=262</guid>
		<description><![CDATA[<p><em>Here is a quick post to get me back into the swing of blogging:</em></p>
<p>I was looking through an old post on StackOverflow about <a href="http://www.haskell.org/haskellwiki/Research_papers/Functional_pearls">clever functional code</a>, and the best answer, given by <a href="http://stackoverflow.com/questions/1524750/what-is-your-favourite-cleverly-written-functional-code/1527026#1527026">&#8220;yairchu&#8221;</a> was a nice version&#8230; <a href="http://coder.bsimmons.name/blog/2009/11/polishing-a-functional-pearl-the-burrows-wheeler-transform/" class="read_more">   [ R E A D &#124; M O R E ]</a></p>]]></description>
			<content:encoded><![CDATA[<p><em>Here is a quick post to get me back into the swing of blogging:</em></p>
<p>I was looking through an old post on StackOverflow about <a href="http://www.haskell.org/haskellwiki/Research_papers/Functional_pearls">clever functional code</a>, and the best answer, given by <a href="http://stackoverflow.com/questions/1524750/what-is-your-favourite-cleverly-written-functional-code/1527026#1527026">&#8220;yairchu&#8221;</a> was a nice version of the <a href="http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler%5Ftransform">Burrows-Wheeler Transform</a>, which is an algorithm for permuting a string such that it can be compressed more effectively by other algorithms. The code posted was (import Data.List assumed):</p>
<p><blockquote class="vimblock"><br>
bwp&nbsp;<span class="Statement">::</span>&nbsp;(<span class="Type">Ord</span>&nbsp;a)<span class="Statement">=&gt;</span>&nbsp;[a]&nbsp;<span class="Statement">-&gt;</span>&nbsp;[a]<br>
bwp&nbsp;xs&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">map</span>&nbsp;<span class="Identifier">snd</span>&nbsp;<span class="Statement">$</span>&nbsp;sort&nbsp;<span class="Statement">$</span>&nbsp;<span class="Identifier">zip</span>&nbsp;(rots&nbsp;xs)&nbsp;(rrot&nbsp;xs)<br>
rots&nbsp;xs&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">take</span>&nbsp;(<span class="Identifier">length</span>&nbsp;xs)&nbsp;(<span class="Identifier">iterate</span>&nbsp;lrot&nbsp;xs)<br>
lrot&nbsp;xs&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">tail</span>&nbsp;xs&nbsp;<span class="Statement">++</span>&nbsp;[<span class="Identifier">head</span>&nbsp;xs]<br>
rrot&nbsp;xs&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">last</span>&nbsp;xs&nbsp;<span class="Statement">:</span>&nbsp;<span class="Identifier">init</span>&nbsp;xs<br>
<br></blockquote></p>
<p>I saw I could improve/shorten this in a couple of obvious ways and came up with this:</p>
<p><blockquote class="vimblock"><br>
bwp&nbsp;<span class="Statement">::</span>&nbsp;(<span class="Type">Ord</span>&nbsp;a)&nbsp;<span class="Statement">=&gt;</span>&nbsp;[a]&nbsp;<span class="Statement">-&gt;</span>&nbsp;[a]<br>
bwp&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">map</span>&nbsp;<span class="Identifier">snd</span>&nbsp;<span class="Statement">.</span>&nbsp;sort&nbsp;<span class="Statement">.</span>&nbsp;rots&nbsp;<br>
rots&nbsp;xs&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">zip</span>&nbsp;(<span class="Identifier">tail</span>&nbsp;<span class="Statement">$</span>&nbsp;<span class="Identifier">iterate</span>&nbsp;lrot&nbsp;xs)&nbsp;xs<br>
lrot&nbsp;(x<span class="Statement">:</span>xs)&nbsp;<span class="Statement">=</span>&nbsp;xs&nbsp;<span class="Statement">++</span>&nbsp;[x]<br>
<br></blockquote></p>
<p>Still unsatisfied and even more obsessed I came up with this final, prettiest version, before forcing myself to give it up already:</p>
<p><blockquote class="vimblock"><br>
bwp&nbsp;<span class="Statement">::</span>&nbsp;(<span class="Type">Ord</span>&nbsp;a)<span class="Statement">=&gt;</span>&nbsp;[a]&nbsp;<span class="Statement">-&gt;</span>&nbsp;[a]<br>
bwp&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">map</span>&nbsp;<span class="Identifier">snd</span>&nbsp;<span class="Statement">.</span>&nbsp;sort&nbsp;<span class="Statement">.</span>&nbsp;rots&nbsp;<br>
rots&nbsp;xs&nbsp;<span class="Statement">=</span>&nbsp;&nbsp;<span class="Identifier">zip</span>&nbsp;(lrot&nbsp;xs)&nbsp;xs<br>
lrot&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">tail</span>&nbsp;<span class="Statement">.</span>&nbsp;tails&nbsp;<span class="Statement">.</span>&nbsp;<span class="Identifier">cycle</span><br>
<br></blockquote></p>
<p>Unfortunately, this last version will croak if your string happens to look like &#8220;111111&#8243; or &#8220;cAbcAb&#8221; because sort will keep trying to compare infinites lists.</p>
<p><em><strong>Update: </strong> I did a short post on the <a href="http://coder.bsimmons.name/blog/2009/11/the-move-to-front-mtf-transform/">Move To Front transform</a> as a follow-up to this post.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://coder.bsimmons.name/blog/2009/11/polishing-a-functional-pearl-the-burrows-wheeler-transform/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cracking a Lock in Haskell with the De Bruijn sequence, pt. 1</title>
		<link>http://coder.bsimmons.name/blog/2009/09/cracking-a-lock-in-haskell-with-the-de-bruijn-sequence-pt-1/</link>
		<comments>http://coder.bsimmons.name/blog/2009/09/cracking-a-lock-in-haskell-with-the-de-bruijn-sequence-pt-1/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 04:14:59 +0000</pubDate>
		<dc:creator>jberryman</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[Array]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[sequences]]></category>

		<guid isPermaLink="false">http://coder.bsimmons.name/blog/?p=215</guid>
		<description><![CDATA[<p><em>Update: a more efficient variation is implemented in <a href="http://coder.bsimmons.name/blog/2009/09/cracking-a-lock-in-haskell-with-the-de-bruijn-sequence-pt-2/">Part 2</a>.</em></p>
<p>A <a href="http://en.wikipedia.org/wiki/De_Bruijn_sequence">De Bruijn sequence</a> is (for example) a cyclical list of characters such that every word of a given length appears once and only once in the sequence.&#8230; <a href="http://coder.bsimmons.name/blog/2009/09/cracking-a-lock-in-haskell-with-the-de-bruijn-sequence-pt-1/" class="read_more">   [ R E A D &#124; M O R E ]</a></p>]]></description>
			<content:encoded><![CDATA[<p><em>Update: a more efficient variation is implemented in <a href="http://coder.bsimmons.name/blog/2009/09/cracking-a-lock-in-haskell-with-the-de-bruijn-sequence-pt-2/">Part 2</a>.</em></p>
<p>A <a href="http://en.wikipedia.org/wiki/De_Bruijn_sequence">De Bruijn sequence</a> is (for example) a cyclical list of characters such that every word of a given length appears once and only once in the sequence. Instead of using letters, you can have a De Bruijn sequence of bits. For example here is one possibility for a sequence in which every 3-bit word appears somewhere (you have to wrap around from the end for the final &#8220;words&#8221;):</p>
<blockquote><p>00011101</p></blockquote>
<p>In a sense we compress a dictionary of 2^3 = 8 3-bit words (or 24 bits) to a sequence of only 2^3 = 8 bits. There are some very simple algorithms for generating these kinds of sequences of bits, and I will implement two here: the &#8220;<strong>prefer one</strong>&#8221; algorithm and a subtle variation called &#8220;<strong><a href="http://people.clarkson.edu/~aalhakim/Mypapers/PreferOpposite.pdf">prefer opposite</a>[PDF] by Alhakim</strong>&#8220;. Here is the author&#8217;s excellent description of the traditional Prefer One algorithm from the linked paper:</p>
<blockquote><p>&#8221; The prefer-one algorithm is a very simple method amazingly capable of generating a full cycle. For any positive integer n ≥ 1, the algorithm puts n zeroes, and proceeds after this by proposing 1 for the next bit and accepting it when the word formed by the last n bits has not been encountered previously in the sequence, otherwise 0 is placed. The algorithm stops when both 0 and 1 do not bring a new word.&#8221;</p></blockquote>
<p>And here is my haskell implementation. It works by creating a lazy array which we build from a list being generated by searching the earlier portions of the array that already have been defined. It&#8217;s very similar to the method I used in modeling the <a href="http://coder.bsimmons.name/blog/2009/06/fun-with-lazy-arrays-the-lz77-algorithm/">LZ77 algorithm</a>.:</p>
<p><blockquote class="vimblock"><br>
<span class="Type">module</span>&nbsp;Main<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="Type">where</span><br>
<br>
<span class="PreProc">import</span>&nbsp;Data.Array<br>
<span class="PreProc">import</span>&nbsp;Data.List(isInfixOf)<br>
<br>
<br>
<span class="Type">type</span>&nbsp;Bit&nbsp;<span class="Statement">=</span>&nbsp;<span class="Type">Bool</span><br>
<br>
preferOne&nbsp;<span class="Statement">::</span>&nbsp;<span class="Type">Int</span>&nbsp;<span class="Statement">-&gt;</span>&nbsp;[&nbsp;Bit&nbsp;]<br>
preferOne&nbsp;n&nbsp;<span class="Statement">=</span>&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="Statement">let</span>&nbsp;upB&nbsp;<span class="Statement">=</span>&nbsp;<span class="Constant">2</span><span class="Statement">^</span>n<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;arr&nbsp;<span class="Statement">=</span>&nbsp;&nbsp;listArray&nbsp;(<span class="Constant">1</span>,&nbsp;upB)&nbsp;(<span class="Identifier">replicate</span>&nbsp;n&nbsp;<span class="Constant">False</span>&nbsp;<span class="Statement">++</span>&nbsp;rest)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Comment">-- since we use Bool for bits, we can say (not.alreadySeen):</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;rest&nbsp;&nbsp;<span class="Statement">=</span>&nbsp;&nbsp;<span class="Identifier">map</span>&nbsp;(<span class="Identifier">not</span>&nbsp;<span class="Statement">.</span>&nbsp;alreadySeen)&nbsp;[n<span class="Statement">+</span><span class="Constant">1</span>&nbsp;<span class="Statement">..</span>&nbsp;upB]<br>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;alreadySeen&nbsp;i&nbsp;<span class="Statement">=</span>&nbsp;<span class="Identifier">or</span>&nbsp;<span class="Statement">$</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Statement">do</span>&nbsp;<span class="Statement">let</span>&nbsp;range&nbsp;<span class="Statement">=</span>&nbsp;[&nbsp;arr<span class="Statement">!</span>i' <span class="Statement">|</span>&nbsp;i' <span class="Statement">&lt;-</span>&nbsp;[i<span class="Statement">-</span>n<span class="Statement">+</span><span class="Constant">1</span>&nbsp;<span class="Statement">..</span>&nbsp;i<span class="Statement">-</span><span class="Constant">1</span>]]&nbsp;<span class="Statement">++</span>&nbsp;[<span class="Constant">True</span>]&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; i1&nbsp;<span class="Statement">&lt;-</span>&nbsp;[<span class="Constant">1</span><span class="Statement">..</span>i<span class="Statement">-</span>n]&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Statement">let</span>&nbsp;rangeP&nbsp;<span class="Statement">=</span>&nbsp;[&nbsp;arr<span class="Statement">!</span>i' <span class="Statement">|</span>&nbsp;i' <span class="Statement">&lt;-</span>&nbsp;[i1&nbsp;<span class="Statement">..</span>&nbsp;i1<span class="Statement">+</span>n<span class="Statement">-</span><span class="Constant">1</span>]&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Identifier">return</span>&nbsp;(range&nbsp;<span class="Statement">==</span>&nbsp;rangeP)<br>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Comment">-- an infinite stream is returned... because I can:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="Statement">in</span>&nbsp;<span class="Identifier">cycle</span>&nbsp;(elems&nbsp;arr)<br>
<br></blockquote></p>
<p>The Prefer Opposite algorithm works on the same principle, but with a couple twists. In my words it is as follows:</p>
<blockquote><p>Start with n zeros. Search for successive bits as follows:</p>
<p>propose to place the bit that is different from the previous bit: if the word formed by this bit has not been seen already, then choose it, else choose the same bit as the last. When 2^3-1 bit have been written, write a 1 for the final bit. and you&#8217;re done. (you can also keep track of how long a string of ones has been written; the sequence will always end with <code>n</code> ones)</p></blockquote>
<p>Here is my implementation:</p>
<p><blockquote class="vimblock"><br>
preferOpposite&nbsp;<span class="Statement">::</span>&nbsp;<span class="Type">Int</span>&nbsp;<span class="Statement">-&gt;</span>&nbsp;[&nbsp;Bit&nbsp;]<br>
preferOpposite&nbsp;n&nbsp;<span class="Statement">=</span>&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="Statement">let</span>&nbsp;upB&nbsp;<span class="Statement">=</span>&nbsp;<span class="Constant">2</span><span class="Statement">^</span>n<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;arr&nbsp;<span class="Statement">=</span>&nbsp;&nbsp;array&nbsp;(<span class="Constant">1</span>,&nbsp;upB)&nbsp;(final&nbsp;<span class="Statement">:</span>&nbsp;inits&nbsp;<span class="Statement">++</span>&nbsp;rest)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Comment">-- we must specify the very last element of the sequence</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Comment">-- which will always be a One:</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;final&nbsp;<span class="Statement">=</span>&nbsp;(upB,<span class="Constant">True</span>)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;inits&nbsp;<span class="Statement">=</span>&nbsp;[&nbsp;(i,<span class="Constant">False</span>)&nbsp;<span class="Statement">|</span>&nbsp;i<span class="Statement">&lt;-</span>[<span class="Constant">1</span><span class="Statement">..</span>n]&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;rest&nbsp;&nbsp;<span class="Statement">=</span>&nbsp;&nbsp;<span class="Identifier">map</span>&nbsp;seqBit&nbsp;[n<span class="Statement">+</span><span class="Constant">1</span>&nbsp;<span class="Statement">..</span>&nbsp;upB<span class="Statement">-</span><span class="Constant">1</span>]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Comment">-- if the word generated by making the current bit the opposite</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Comment">-- of the previous has already been seen, we make the bit the</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Comment">-- same as the previous:</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seqBit&nbsp;i&nbsp;<span class="Statement">=</span>&nbsp;&nbsp;(i,&nbsp;nextB)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Type">where</span>&nbsp;bP&nbsp;<span class="Statement">=</span>&nbsp;arr<span class="Statement">!</span>(i<span class="Statement">-</span><span class="Constant">1</span>)&nbsp;&nbsp;<span class="Comment">--previous bit</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;nextB&nbsp;<span class="Statement">|</span>&nbsp;<span class="Identifier">or</span>&nbsp;(alreadySeen&nbsp;i&nbsp;bP)&nbsp;<span class="Statement">=</span>&nbsp;bP&nbsp;<span class="Comment">--same as previous</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Statement">|</span>&nbsp;<span class="Identifier">otherwise</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Statement">=</span>&nbsp;<span class="Identifier">not</span>&nbsp;bP&nbsp;<span class="Comment">--choose opposite </span><br>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Comment">--checks if the n-length string we would form by choosing the</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Comment">--opposite for the next bit is already present in the array:</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;alreadySeen&nbsp;i&nbsp;bP&nbsp;<span class="Statement">=</span>&nbsp;<span class="Statement">do</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Statement">let</span>&nbsp;range&nbsp;<span class="Statement">=</span>&nbsp;[&nbsp;arr<span class="Statement">!</span>i' <span class="Statement">|</span>&nbsp;i' <span class="Statement">&lt;-</span>&nbsp;[i<span class="Statement">-</span>n<span class="Statement">+</span><span class="Constant">1</span>&nbsp;<span class="Statement">..</span>&nbsp;i<span class="Statement">-</span><span class="Constant">1</span>]]&nbsp;<span class="Statement">++</span>&nbsp;[<span class="Identifier">not</span>&nbsp;bP]&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; i1&nbsp;<span class="Statement">&lt;-</span>&nbsp;[<span class="Constant">1</span><span class="Statement">..</span>i<span class="Statement">-</span>n]&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Statement">let</span>&nbsp;rangeP&nbsp;<span class="Statement">=</span>&nbsp;[&nbsp;arr<span class="Statement">!</span>i' <span class="Statement">|</span>&nbsp;i' <span class="Statement">&lt;-</span>&nbsp;[i1&nbsp;<span class="Statement">..</span>&nbsp;i1<span class="Statement">+</span>n<span class="Statement">-</span><span class="Constant">1</span>]&nbsp;]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Identifier">return</span>&nbsp;(range&nbsp;<span class="Statement">==</span>&nbsp;rangeP)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="Statement">in</span>&nbsp;<span class="Identifier">cycle</span>&nbsp;(elems&nbsp;arr)<br>
<br></blockquote></p>
<p>Enough code for now, let&#8217;s talk about applications of the sequence. The most easily-approachable and interesting application to me applies to those keyed entry systems, such as electronic door locks, which accept a stream of keys (i.e. they unlock as soon as the correct key sequence is input, without any need to press an &#8220;enter&#8221; key). These mechanisms are very susceptible to attack with a De Bruijn sequence of key presses.</p>
<p>Since the above algorithms use a binary alphabet, as opposed to say a decimal one found on electronic keypads (check out <a href="http://alicebobandmallory.com/articles/2009/09/23/why-you-should-use-four-different-digits-for-keypad-locks">Jonas Elfström&#8217;s blog post</a> dealing with keypads with worn out letters for more on that), I will choose as my target an old-fashined garage door opener.</p>
<p>An old-style garage door opener remote has 8 binary <a href="http://en.wikipedia.org/wiki/Dip_switch">DIP switches</a> allowing 256 different code combinations. We will imagine the garage door is susceptible to a stream-based attack like the electronic lock we described.</p>
<p>We can model the garage door receiver as follows:</p>
<p><blockquote class="vimblock"><br>
<span class="Type">type</span>&nbsp;Combo&nbsp;<span class="Statement">=</span>&nbsp;[&nbsp;Bit&nbsp;]<br>
<span class="Type">type</span>&nbsp;Receiver&nbsp;<span class="Statement">=</span>&nbsp;Combo&nbsp;<span class="Statement">-&gt;</span>&nbsp;<span class="Type">Bool</span><br>
<br>
<span class="Comment">-- True means access granted</span><br>
programReceiver&nbsp;<span class="Statement">::</span>&nbsp;Combo&nbsp;<span class="Statement">-&gt;</span>&nbsp;Receiver<br>
programReceiver&nbsp;<span class="Statement">=</span>&nbsp;isInfixOf&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>
<br></blockquote></p>
<p>Now let&#8217;s test out our model with this <code>main</code> function:</p>
<p><blockquote class="vimblock"><br>
main&nbsp;<span class="Statement">=</span>&nbsp;&nbsp;<span class="Statement">let</span>&nbsp;secretCode&nbsp;<span class="Statement">=</span>&nbsp;[<span class="Constant">False</span>,<span class="Constant">False</span>,<span class="Constant">True</span>,<span class="Constant">False</span>,<span class="Constant">True</span>,<span class="Constant">False</span>,<span class="Constant">True</span>,<span class="Constant">True</span>]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;receiver&nbsp;<span class="Statement">=</span>&nbsp;programReceiver&nbsp;secretCode<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;crackingStream&nbsp;<span class="Statement">=</span>&nbsp;preferOne&nbsp;<span class="Constant">8</span><br>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Statement">in</span>&nbsp;<span class="Statement">if</span>&nbsp;receiver&nbsp;crackingStream<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Statement">then</span>&nbsp;<span class="Identifier">print</span>&nbsp;<span class="Constant">&quot;WE'RE IN!&quot;</span><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Statement">else</span>&nbsp;<span class="Identifier">print</span>&nbsp;<span class="Constant">&quot;...bugs&quot;</span><br>
<br></blockquote></p>
<p>Looks good!:</p>
<blockquote><p>
*Main> :main<br />
&#8220;WE&#8217;RE IN!&#8221;</p></blockquote>
<p>In the next couple posts I&#8217;ll explore improvements to the performance of these algorithms and maybe a few other things. </p>
]]></content:encoded>
			<wfw:commentRss>http://coder.bsimmons.name/blog/2009/09/cracking-a-lock-in-haskell-with-the-de-bruijn-sequence-pt-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Fun with Lazy Arrays: the LZ77 Algorithm</title>
		<link>http://coder.bsimmons.name/blog/2009/06/fun-with-lazy-arrays-the-lz77-algorithm/</link>
		<comments>http://coder.bsimmons.name/blog/2009/06/fun-with-lazy-arrays-the-lz77-algorithm/#comments</comments>
		<pubDate>Thu, 18 Jun 2009 17:30:29 +0000</pubDate>
		<dc:creator>jberryman</dc:creator>
				<category><![CDATA[haskell]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[Array]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[recursion]]></category>

		<guid isPermaLink="false">http://coder.bsimmons.name/blog/?p=187</guid>
		<description><![CDATA[<p><em>This is my third post investigating compression techniques related to the DEFLATE algorithm: the first on <a href="http://coder.bsimmons.name/blog/2009/05/run-length-encoding/">run-length encoding</a>, and the second on simple <a href="http://coder.bsimmons.name/blog/2009/05/huffman-coding/">Huffman Coding</a>. This post models the <a href="http://www.zlib.net/feldspar.html">LZ77 algorithm, the second of the two compression</a></em>&#8230; <a href="http://coder.bsimmons.name/blog/2009/06/fun-with-lazy-arrays-the-lz77-algorithm/" class="read_more">   [ R E A D &#124; M O R E ]</a></p>]]></description>
			<content:encoded><![CDATA[<p><em>This is my third post investigating compression techniques related to the DEFLATE algorithm: the first on <a href="http://coder.bsimmons.name/blog/2009/05/run-length-encoding/">run-length encoding</a>, and the second on simple <a href="http://coder.bsimmons.name/blog/2009/05/huffman-coding/">Huffman Coding</a>. This post models the <a href="http://www.zlib.net/feldspar.html">LZ77 algorithm, the second of the two compression strategies used by DEFLATE</a>, and in the process explores some interesting properties of Haskell&#8217;s basic Arrays.</em></p>
<h5>IMPLEMENTATION:</h5>
<p><div class="vimblock"><br>
<span class="Comment">&gt;</span>&nbsp;<span class="Type">module</span>&nbsp;LZ77<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp; <span class="Type">where</span><br>
<br>
we will use GHC's &quot;basic non-strict arrays&quot; for this <br>
experiment:<br>
<br>
<span class="Comment">&gt;</span>&nbsp;<span class="PreProc">import</span>&nbsp;Data.Array<br>
<br>
<br>
and use Ints to store the length of the entire decoded<br>
message (needed to create our array):<br>
<br>
<span class="Comment">&gt;</span>&nbsp;<span class="Type">type</span>&nbsp;Length&nbsp;<span class="Statement">=</span>&nbsp;<span class="Type">Int</span><br>
<span class="Comment">&gt;</span>&nbsp;<span class="Type">type</span>&nbsp;Offset&nbsp;<span class="Statement">=</span>&nbsp;<span class="Type">Int</span><br>
<br>
in place of the length in the standard length-offset pair<br>
I've decided to use an Int representing the index of the <br>
last element in the span relative to the element at<br>
offset. Thus the pair encoding a span of only one element,<br>
two elements back would be (0,2), rather than the more <br>
traditional (1,2). <br>
<br>
This simply makes more sense to me, especially since I want<br>
to be able to encoded reversed sequences, as you will see.<br>
<br>
<span class="Comment">&gt;</span>&nbsp;<span class="Type">type</span>&nbsp;Index&nbsp;<span class="Statement">=</span>&nbsp;<span class="Type">Int</span><br>
<br>
and a few simple dataypes for our uncompressed and <br>
compressed data respectively:<br>
<br>
<span class="Comment">&gt;</span>&nbsp;<span class="Type">type</span>&nbsp;Decoded&nbsp;a&nbsp;<span class="Statement">=</span>&nbsp;Array&nbsp;Index&nbsp;a<br>
<span class="Comment">&gt;</span><br>
<span class="Comment">&gt;</span>&nbsp;<span class="Type">data</span>&nbsp;Encoded&nbsp;a&nbsp;<span class="Statement">=</span>&nbsp;<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp; Enc&nbsp;Length&nbsp;[&nbsp;<span class="Type">Either</span>&nbsp;a&nbsp;(Index,Offset)&nbsp;]&nbsp;<br>
<br>
<br>
The decompress function works by traversing the encoded message,<br>
keeping track of our array index position (since offsets are <br>
relative to the current position), and building an Array lazily<br>
from a list which we generate, in part by referencing elements<br>
from the partially generated array itself. <br>
<br>
So when we see a Right value we look up in the Array the elements <br>
referenced by the length-offset, concat-ing that list with the<br>
result of processing the rest of the encoded message.<br>
<br>
If we hit [] in 'dec' we call an error because the stored value <br>
for the length of the uncompressed message in the Encoded type<br>
was longer than what the 'decompress' function could produce.<br>
<br>
It's diffficult to describe, but I hope the code is clear:<br>
<br>
<span class="Comment">&gt;</span>&nbsp;decompress&nbsp;<span class="Statement">::</span>&nbsp;Encoded&nbsp;a&nbsp;<span class="Statement">-&gt;</span>&nbsp;Decoded&nbsp;a<br>
<span class="Comment">&gt;</span>&nbsp;decompress&nbsp;(Enc&nbsp;el&nbsp;es)&nbsp;<span class="Statement">=</span>&nbsp;decoded<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp; <span class="Type">where</span>&nbsp;decoded&nbsp;<span class="Statement">=</span>&nbsp;listArray&nbsp;(<span class="Constant">0</span>,el&nbsp;<span class="Statement">-</span>&nbsp;<span class="Constant">1</span>)&nbsp;<span class="Statement">$</span>&nbsp;dec&nbsp;<span class="Constant">0</span>&nbsp;es<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; dec&nbsp;&nbsp;_&nbsp;&nbsp;&nbsp;&nbsp; []&nbsp;<span class="Statement">=</span>&nbsp;<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Identifier">error</span>&nbsp;<span class="Constant">&quot;message is shorter than it should be&quot;</span>&nbsp;<br>
<span class="Comment">&gt;</span><br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; dec&nbsp;n&nbsp;(<span class="Constant">Left</span>&nbsp;x&nbsp;<span class="Statement">:</span>&nbsp;xs)&nbsp;<span class="Statement">=</span>&nbsp;<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; x&nbsp;<span class="Statement">:</span>&nbsp;dec&nbsp;(n<span class="Statement">+</span><span class="Constant">1</span>)&nbsp;xs<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; dec&nbsp;n&nbsp;(<span class="Constant">Right</span>&nbsp;(iRel,off)&nbsp;<span class="Statement">:</span>&nbsp;xs)&nbsp;<span class="Statement">=</span><br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Statement">let</span>&nbsp;i1&nbsp;<span class="Statement">=</span>&nbsp;n&nbsp;&nbsp;<span class="Statement">-</span>&nbsp;off<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; i2&nbsp;<span class="Statement">=</span>&nbsp;(<span class="Statement">if</span>&nbsp;iN&nbsp;<span class="Statement">&gt;</span>&nbsp;i1&nbsp;<span class="Statement">then</span>&nbsp;<span class="Identifier">succ</span>&nbsp;<span class="Statement">else</span>&nbsp;<span class="Identifier">pred</span>)&nbsp;i1&nbsp;<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; iN&nbsp;<span class="Statement">=</span>&nbsp;i1&nbsp;<span class="Statement">+</span>&nbsp;iRel<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="Statement">in</span>&nbsp;[&nbsp;decoded<span class="Statement">!</span>i&nbsp;<span class="Statement">|</span>&nbsp;i&nbsp;<span class="Statement">&lt;-</span>&nbsp;[i1,&nbsp;i2&nbsp;<span class="Statement">..</span>&nbsp;iN]&nbsp;]&nbsp;<span class="Statement">++</span>&nbsp;<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; dec&nbsp;(n&nbsp;<span class="Statement">+</span>&nbsp;<span class="Constant">1</span>&nbsp;<span class="Statement">+</span>&nbsp;<span class="Identifier">abs</span>&nbsp;iRel)&nbsp;xs<br>
<br>
<br>
Some interesting things about the code above:<br>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;1) we create an array from a list, which we build,<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; in part, by looking up elements from the array<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; we are in the process of building. <br>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;2) we can compress a sequence of symbols which were<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; seen previously but in reverse order, simply by <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; storing a negative relative index in the <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (relative_index,offset) tuple. So, the string... <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&quot;her racecar returns to race&quot; <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; might compress to: <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{her race(-5,2)turns to(4,19)}<br>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; I'm not sure if this is useful in real compression,<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; especially when it comes down to the binary encoding.<br>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;3) even more interesting, we can use this same decoder<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; function to decompress data that matches a sequence<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; in parts of the array we haven't built yet! We simply <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; use a negative offset in our tuple. <br>
<br></div></p>
<h5>EXAMPLES AND CONCLUSION:</h5>
<p><div class="vimblock"><br>
point (3) may or may not be something like the LZ78 algorithm, <br>
which apparently works by encoding future data, but it is <br>
defintely a cool thing to be able to do with arrays:<br>
<br>
<span class="Comment">&gt;</span>&nbsp;coolArray&nbsp;<span class="Statement">=</span>&nbsp;listArray&nbsp;(<span class="Constant">0</span>,<span class="Constant">4</span>)&nbsp;<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[<span class="Constant">0</span>,&nbsp;&nbsp;coolArray<span class="Statement">!</span><span class="Constant">4</span>&nbsp;<span class="Statement">-</span>&nbsp;<span class="Constant">3</span>,&nbsp;&nbsp;<span class="Constant">2</span>,&nbsp;&nbsp;coolArray<span class="Statement">!</span><span class="Constant">1</span>&nbsp;<span class="Statement">+</span>&nbsp;<span class="Constant">2</span>,&nbsp;&nbsp;<span class="Constant">4</span>]<br>
<br>
<br>
Here's an example with great compression where the relative <br>
index exceeds the offset:<br>
<br>
<span class="Comment">&gt;</span>&nbsp;exceedsOffset&nbsp;<span class="Statement">=</span>&nbsp;elems&nbsp;<span class="Statement">$</span>&nbsp;decompress&nbsp;<span class="Statement">$</span><br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Enc&nbsp;<span class="Constant">25</span>&nbsp;[<span class="Constant">Left</span>&nbsp;<span class="Constant">'B'</span>,&nbsp;<span class="Constant">Left</span>&nbsp;<span class="Constant">'l'</span>,&nbsp;<span class="Constant">Left</span>&nbsp;<span class="Constant">'a'</span>,&nbsp;<span class="Constant">Left</span>&nbsp;<span class="Constant">'h'</span>,&nbsp;<span class="Constant">Left</span>&nbsp;<span class="Constant">' '</span>,<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Constant">Left</span>&nbsp;<span class="Constant">'b'</span>,&nbsp;<span class="Constant">Right</span>&nbsp;(<span class="Constant">17</span>,<span class="Constant">5</span>),&nbsp;<span class="Constant">Left</span>&nbsp;<span class="Constant">'!'</span>]<br>
<br>
<br>
...and a code example for (2) above:<br>
<br>
<span class="Comment">&gt;</span>&nbsp;reverseReference&nbsp;<span class="Statement">=</span>&nbsp;elems&nbsp;<span class="Statement">$</span>&nbsp;decompress&nbsp;<span class="Statement">$</span><br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Enc&nbsp;<span class="Constant">27</span>&nbsp;[<span class="Constant">Left</span>&nbsp;<span class="Constant">'h'</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">'e'</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">'r'</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">' '</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">'r'</span>,<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Constant">Left</span>&nbsp;<span class="Constant">'a'</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">'c'</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">'e'</span>,<span class="Constant">Right</span>(<span class="Statement">-</span><span class="Constant">5</span>,<span class="Constant">2</span>),<span class="Constant">Left</span>&nbsp;<span class="Constant">'t'</span>,<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Constant">Left</span>&nbsp;<span class="Constant">'u'</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">'r'</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">'n'</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">'s'</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">' '</span>,<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span class="Constant">Left</span>&nbsp;<span class="Constant">'t'</span>,<span class="Constant">Left</span>&nbsp;<span class="Constant">'o'</span>,<span class="Constant">Right</span>(<span class="Constant">4</span>,<span class="Constant">19</span>)]<br>
<br>
...and finally an example combining (2) and (3):<br>
<br>
<span class="Comment">&gt;</span>&nbsp;reverseLookAhead&nbsp;<span class="Statement">=</span>&nbsp;elems&nbsp;<span class="Statement">$</span>&nbsp;decompress&nbsp;<span class="Statement">$</span>&nbsp;<br>
<span class="Comment">&gt;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Enc&nbsp;<span class="Constant">5</span>&nbsp;[<span class="Constant">Left</span>&nbsp;<span class="Constant">1</span>,&nbsp;<span class="Constant">Right</span>&nbsp;(<span class="Statement">-</span><span class="Constant">1</span>,<span class="Statement">-</span><span class="Constant">3</span>),&nbsp;<span class="Constant">Left</span>&nbsp;<span class="Constant">3</span>,&nbsp;<span class="Constant">Left</span>&nbsp;<span class="Constant">4</span>]<br>
<br>
<br>
I was surprised to discover these properties in Haskell's lazy<br>
Arrays. hope they came as a surprise to a few others.<br>
<br></div></p>
]]></content:encoded>
			<wfw:commentRss>http://coder.bsimmons.name/blog/2009/06/fun-with-lazy-arrays-the-lz77-algorithm/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
