<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Initial tests of Tries: Follow Up</title>
	<atom:link href="http://coder.bsimmons.name/blog/2009/04/initial-tests-of-tries-follow-up/feed/" rel="self" type="application/rss+xml" />
	<link>http://coder.bsimmons.name/blog/2009/04/initial-tests-of-tries-follow-up/</link>
	<description>fragmentary ideas  ䷿  intellectual what-nots  ䷷  and haskell programming  ䷴</description>
	<lastBuildDate>Thu, 22 Jul 2010 21:36:05 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: TST &#171; Michael Flor&#8217;s Blog</title>
		<link>http://coder.bsimmons.name/blog/2009/04/initial-tests-of-tries-follow-up/comment-page-1/#comment-120</link>
		<dc:creator>TST &#171; Michael Flor&#8217;s Blog</dc:creator>
		<pubDate>Mon, 07 Sep 2009 17:37:27 +0000</pubDate>
		<guid isPermaLink="false">http://coder.bsimmons.name/blog/?p=112#comment-120</guid>
		<description>[...] Initial tests of Tries: Follow Up [...]</description>
		<content:encoded><![CDATA[<div style="color: #E8E8E8;</p>
<p>blockquote {<br />
        margin: 2em;<br />
        padding: .1em 1.5em;<br />
        position: relative;<br />
        background: #2F2F2F;<br />
}<br />
">
<p>[...] Initial tests of Tries: Follow Up [...]</p>
</div>
]]></content:encoded>
	</item>
	<item>
		<title>By: Edward kmett</title>
		<link>http://coder.bsimmons.name/blog/2009/04/initial-tests-of-tries-follow-up/comment-page-1/#comment-40</link>
		<dc:creator>Edward kmett</dc:creator>
		<pubDate>Wed, 22 Apr 2009 20:18:47 +0000</pubDate>
		<guid isPermaLink="false">http://coder.bsimmons.name/blog/?p=112#comment-40</guid>
		<description>As an aside, you can abuse unsafePerformIO to update access counts in references, and hide the whole thing behind a seemingly functional facade, doing rotations next time you mutate that part of the tree. a missed &#039;read update&#039; is harmless so you don&#039;t care too much about racing on the counts. The net result appears functional, it just magically guesses a nicer rebalancing which is invisible to the external facing API.</description>
		<content:encoded><![CDATA[<p>As an aside, you can abuse unsafePerformIO to update access counts in references, and hide the whole thing behind a seemingly functional facade, doing rotations next time you mutate that part of the tree. a missed &#8216;read update&#8217; is harmless so you don&#8217;t care too much about racing on the counts. The net result appears functional, it just magically guesses a nicer rebalancing which is invisible to the external facing API.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Edward kmett</title>
		<link>http://coder.bsimmons.name/blog/2009/04/initial-tests-of-tries-follow-up/comment-page-1/#comment-39</link>
		<dc:creator>Edward kmett</dc:creator>
		<pubDate>Wed, 22 Apr 2009 20:16:11 +0000</pubDate>
		<guid isPermaLink="false">http://coder.bsimmons.name/blog/?p=112#comment-39</guid>
		<description>@jberryman

There has been work on doing just that, its the &#039;conditional rotation&#039; heuristic and it works really well... imperatively.

http://www.bcs.org/upload/pdf/oommen.pdf

Unfortunately, there is a cost intrinsic in a functional implementation that isn&#039;t experienced by an imperative implementation of the same idea: 

It involves mutation to update the count functionally, which means you spew out garbage, which has to be collected, and worse, this happens on reads, not writes.

Consequently, reads which previously didn&#039;t tax the GC at all, now generate garbage. This same problem tends to affect functional splay trees. And to a lesser degree, explains why even rebalancing (which only occurs during writes!) in a mutation-free setting is more expensive than elsewhere, because imperatively you are free to relink the existing tree rather than allocate and relink all the way to the root.</description>
		<content:encoded><![CDATA[<p>@jberryman</p>
<p>There has been work on doing just that, its the &#8216;conditional rotation&#8217; heuristic and it works really well&#8230; imperatively.</p>
<p><a href="http://www.bcs.org/upload/pdf/oommen.pdf" rel="nofollow">http://www.bcs.org/upload/pdf/oommen.pdf</a></p>
<p>Unfortunately, there is a cost intrinsic in a functional implementation that isn&#8217;t experienced by an imperative implementation of the same idea: </p>
<p>It involves mutation to update the count functionally, which means you spew out garbage, which has to be collected, and worse, this happens on reads, not writes.</p>
<p>Consequently, reads which previously didn&#8217;t tax the GC at all, now generate garbage. This same problem tends to affect functional splay trees. And to a lesser degree, explains why even rebalancing (which only occurs during writes!) in a mutation-free setting is more expensive than elsewhere, because imperatively you are free to relink the existing tree rather than allocate and relink all the way to the root.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jberryman</title>
		<link>http://coder.bsimmons.name/blog/2009/04/initial-tests-of-tries-follow-up/comment-page-1/#comment-38</link>
		<dc:creator>jberryman</dc:creator>
		<pubDate>Wed, 22 Apr 2009 17:06:55 +0000</pubDate>
		<guid isPermaLink="false">http://coder.bsimmons.name/blog/?p=112#comment-38</guid>
		<description>@Edward Kmett:

Thanks a lot for the feedback. I&#039;m realizing how tricky it is to create a good data structure like this.

I&#039;ve been a little busy, but want to re-write MagnusTree using some balancing schemes that occurred to me and that I was curious about:

I realized that when searching for a branch to continue down the Trie, we want to keep the most common branches closer to the root (of this particular node of the Trie). It would be pretty easy to come up with a Trie of weight-balanced tries:

we could either store an integer at each node and increment it each time we &quot;enter&quot; back into the Trie at that branch, then do one rotation upwards if we see that the weight of our branch is greater than the parent branch.

an alternative involving more rotations, but which doesn&#039;t need to store weights might be to simply rotate a branch upward every time we match its element and enter it, keeping the common nodes hopefully close to the top.</description>
		<content:encoded><![CDATA[<p>@Edward Kmett:</p>
<p>Thanks a lot for the feedback. I&#8217;m realizing how tricky it is to create a good data structure like this.</p>
<p>I&#8217;ve been a little busy, but want to re-write MagnusTree using some balancing schemes that occurred to me and that I was curious about:</p>
<p>I realized that when searching for a branch to continue down the Trie, we want to keep the most common branches closer to the root (of this particular node of the Trie). It would be pretty easy to come up with a Trie of weight-balanced tries:</p>
<p>we could either store an integer at each node and increment it each time we &#8220;enter&#8221; back into the Trie at that branch, then do one rotation upwards if we see that the weight of our branch is greater than the parent branch.</p>
<p>an alternative involving more rotations, but which doesn&#8217;t need to store weights might be to simply rotate a branch upward every time we match its element and enter it, keeping the common nodes hopefully close to the top.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Edward Kmett</title>
		<link>http://coder.bsimmons.name/blog/2009/04/initial-tests-of-tries-follow-up/comment-page-1/#comment-37</link>
		<dc:creator>Edward Kmett</dc:creator>
		<pubDate>Tue, 21 Apr 2009 11:04:20 +0000</pubDate>
		<guid isPermaLink="false">http://coder.bsimmons.name/blog/?p=112#comment-37</guid>
		<description>@wren

For the ternary trie, all I was using we a left/right split based comparison with the key element or a center leg for when the partial key matches, which when made out of simpler parts rather than flattened and specialized would look like:

data Tern k a = Tern !Int !k !(Maybe a) !(Tern k a) !(Tern k a) !(Tern k a) &#124; Tip
data Trie k a = !(Maybe a) :&gt; !(Tern k a)

that can then be optimized by flattening, specializing, including runs ala PATRICIA using UArr&#039;s, unpacking, even hash consing leaves to obtain a DAWG, etc.

You&#039;ll note that looks a lot like Data.Map, but you can balance it using either the entire mass of nodes underneath or the number of leaves in the trie, rather than purely using the local weight of the single level of the Data.Map, so it gets a better weighting system. You probably want to set the delta higher than in Data.Map though to do less rebalancing work, especially if you use mass.

@jberryman

You can get a 30% speedup on Magnus&#039; tries by specializing them ala the work we&#039;ve been doing on adaptive-containers:

data CharIntTrie
    = NothingNothingNode
    &#124; JustNothingNode    !Int
    &#124; NothingJustNode         !Char !CharIntTrie !CharIntTrie !CharIntTrie
    &#124; JustJustNode       !Int !Char !CharIntTrie !CharIntTrie !CharIntTrie
    &#124; NothingSimpleNode       !Char              !CharIntTrie
    &#124; JustSimpleNode     !Int !Char              !CharIntTrie
    deriving (Show)

and a much smaller win can be obtained by doing a worker/wrapper transform on insertWith

insertWith :: (Int → Int → Int) → String → Int → CharIntTrie → CharIntTrie
insertWith f s v t = go t s where

    go (NothingJustNode head lesser tail greater) [] = JustJustNode v head lesser tail greater
    go (NothingJustNode head lesser tail greater) aas@(a:as) =
     case compare a head of
      LT → NothingJustNode head (go lesser aas) tail greater
      EQ → NothingJustNode head lesser (go tail as) greater
      GT → NothingJustNode head lesser tail (go greater aas)
   ...

but thats all you&#039;ll get without using some array (UArr/Bytestring) style packing for compaction.</description>
		<content:encoded><![CDATA[<p>@wren</p>
<p>For the ternary trie, all I was using we a left/right split based comparison with the key element or a center leg for when the partial key matches, which when made out of simpler parts rather than flattened and specialized would look like:</p>
<p>data Tern k a = Tern !Int !k !(Maybe a) !(Tern k a) !(Tern k a) !(Tern k a) | Tip<br />
data Trie k a = !(Maybe a) :&gt; !(Tern k a)</p>
<p>that can then be optimized by flattening, specializing, including runs ala PATRICIA using UArr&#8217;s, unpacking, even hash consing leaves to obtain a DAWG, etc.</p>
<p>You&#8217;ll note that looks a lot like Data.Map, but you can balance it using either the entire mass of nodes underneath or the number of leaves in the trie, rather than purely using the local weight of the single level of the Data.Map, so it gets a better weighting system. You probably want to set the delta higher than in Data.Map though to do less rebalancing work, especially if you use mass.</p>
<p>@jberryman</p>
<p>You can get a 30% speedup on Magnus&#8217; tries by specializing them ala the work we&#8217;ve been doing on adaptive-containers:</p>
<p>data CharIntTrie<br />
    = NothingNothingNode<br />
    | JustNothingNode    !Int<br />
    | NothingJustNode         !Char !CharIntTrie !CharIntTrie !CharIntTrie<br />
    | JustJustNode       !Int !Char !CharIntTrie !CharIntTrie !CharIntTrie<br />
    | NothingSimpleNode       !Char              !CharIntTrie<br />
    | JustSimpleNode     !Int !Char              !CharIntTrie<br />
    deriving (Show)</p>
<p>and a much smaller win can be obtained by doing a worker/wrapper transform on insertWith</p>
<p>insertWith :: (Int → Int → Int) → String → Int → CharIntTrie → CharIntTrie<br />
insertWith f s v t = go t s where</p>
<p>    go (NothingJustNode head lesser tail greater) [] = JustJustNode v head lesser tail greater<br />
    go (NothingJustNode head lesser tail greater) aas@(a:as) =<br />
     case compare a head of<br />
      LT → NothingJustNode head (go lesser aas) tail greater<br />
      EQ → NothingJustNode head lesser (go tail as) greater<br />
      GT → NothingJustNode head lesser tail (go greater aas)<br />
   &#8230;</p>
<p>but thats all you&#8217;ll get without using some array (UArr/Bytestring) style packing for compaction.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
