Initial tests of Tries: Follow Up

18/04/2009

This is to wrap up my previous post about Tries and my attempts at an implementation, and to summarize some of the things I think I learned about implementing a Trie structure in Haskell.

Complexity is Slow

My initial idea of using a container type at each node to store the collection of branches was a good idea for testing (I could easily swap in Data.Map, or my own simple binary tree) and convenience (I could use Data.Map’s built-in functions to insert the element of the list key into the the container); but it turned out to be bad for performance.

One thing I noticed was that Data.Map is not strict in its ‘value’ field, which is where I was storing more Trie types with more Maps to hold more Trie types… this seems to be simply too complicated to be fast.

Simplicity wins over Balanced Trees

Magnus Jonsson replied with a complete re-working of my implementation which uses different constructors in the Trie type to handle both the linear node-to-node movement through the key, and the searching through the set of branches. This gives a good illustration of the concept. His is the winner in terms of performance.

He doesn’t use any techniques to keep the tree balanced but his implementation out-performs my Trie with its cumbersome (but balanced) Maps at each node. I also found that when I swapped in a module MyMap (a simple non-balanced tree) in place of Data.Map that the code was about twice as fast in both tests I performed.

I would like to play with approaches to keeping the Branch trees balanced (or almost-balanced) in a lightweight way. Perhaps a splay tree or an adaptation would make sense .

Radix Trees don’t make Sense

EDIT: Jake McArthur has pointed out that my characterization of traversing data structures in FP and conclusion in this paragraph are probably incorrect.

I chose to use a special “bucket” constructor to hold the tails of keys in their original list form when the tail was unique; e.g. when inserting the key “washing” into a Trie containing only the key “waste”, the “was” would overlap and each element 'w'--> 'a' --> 's' would reside in its own Trie constructor, but both the “te” and “hing” would be stored as intact lists in separate ValBucket constructors, since there is no need to touch them yet.I thought extending this idea and implementing a radix tree might be even more efficient.

But this doesn’t make much sense in FP. Imagine the list type [] was defined in a more verbose way as follows:


data List a = L a (List a)
            | X

In our Trie structure, a sequence of key elements which have no branches (which would be reduced to a bucket in the radix tree) might look something like this:

Key ‘w’ (Key ‘a’ (Key ’s’ (Branches … )))

…whereas the same situation in an attempted radix tree might look like:

KeyBucket (L ‘w’ (L ‘a’ (L ’s’ X))) (Branches …)

…which is essentially identical. And since in functional programming when we “traverse” a data structure (e.g. to compare two keys) we are actual destroying it and rebuilding it as we go, it stands to reason that it should be just as easy to rebuild our structure with a constructor of a different name (Key vs. List).

Storing the tail ends of keys when we can makes sense because we avoid destroying the structure until we have to inspect it.

Summary of Tests:

These are not scientific, but I include them to give you an idea of the performance differences I was seeing. Tests were run several times and averaged, and compiled with: ghc --make -prof -auto-all -O2 -funbox-strict-fields.

The first test frequency.hs is described in my last post. The second test dictLookup.hs reads an alphabetized list of words and builds a dictionary Trie from the word to the word’s line number. It then reads a random list of words from a file and prints (word, line_number) tuples.

Tests of frequency.hs:

MagnusTrie Trie (with MyMap) Data.Map Trie (with Data.Map)
ticks: 21 29 39 58
total alloc. (MB): 100 133 101 235

Tests of dictLookup.hs:

MagnusTrie Trie (with MyMap) Data.Map Trie (with Data.Map)
ticks: 17 20 18 46
total alloc. (MB): 99 133 103 215

The whole set of tests and files can be downloaded with:

$ darcs get http://coder.bsimmons.name/code/Trie/

Unpack the texts.tar.gz file before running any of the code. Thanks for the patience with the long post. Let me know your thoughts.

UPDATE:
I finally was able to test wren ng thornton’s bytestring-trie library with one of my tests. I couldn’t profile the code (GHC complained the libs were missing) but I ran dictLookup.hs with BenchPress and the bytestring-trie modified version looked about twice as fast.

There are 15 comments in this article:

  1. 18/04/2009Some initial tests of Tries - LAMBDAPHONE says:

    [...] You can read some of my conclusions in this post. [...]

  2. 18/04/2009Mitch says:

    Have you thought about using a sorted vector instead of a map/tree at each node? You’d still get log(n) search time with a binary search. It might be more expensive to update, but I wonder if you wouldn’t win on less pointer-chasing and cache churning.

  3. 19/04/2009Jake McArthur says:

    “in functional programming when we “traverse” a data structure (e.g. to compare two keys) we are actual destroying it and rebuilding it as we go”

    Not really. When you traverse a data structure, you really are just traversing it (and in Haskell, you may also be evaluating it). There is no destroying and rebuilding going on unless you are changing something.

  4. 19/04/2009jberryman says:

    @Jake McArthur:
    Thanks for the correction. Is it fair to say that conceptually you are destroying and rebuilding a structure as you traverse it? Or do I have it all wrong? What about my conclusion regarding radix trees?

    @Mitch:
    I’m ignorant about sorted vectors. Is there a good link I could look at?

  5. 19/04/2009wren ng thornton says:

    It seems Mark Wotton pointed out my bytestring-trie package[1] on your last post. I’m curious how it fares on your benchmarks? I tried just adding the option to your frequency and dictLookup tests, but the running time is so dominated by disk access that there’s negligible difference between any of them as far as I can see.

    As for how bytestring-trie works: (1) it uses ByteStrings instead of Strings in order to save enormous amounts of memory as well as access time; (2) it’s a radix/patricia tree, first based on a char-by-char comparison and then on a bit-by-bit comparison like Data.IntMap; (3) it does a significant degree of fusing the different structural types (e.g. your example would be (KeyBucket “was” (Branches …)) where “was” is a packed vector of bytes, instead of a linked list). If you download the source and look at Data.Trie.Internal there’s a long comment that walks through the derivation of the datatype from fusing the naive collection of types for representing a trie graph.

    [1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bytestring-trie

  6. 19/04/2009Mitch says:

    I just meant a sorted array. Expensive to update, but it doesn’t have all those internal pointers that the tree implementation has. Plus, the array is bounded by the alphabet size, so it’s not going to be terribly huge.

  7. 20/04/2009Edward kmett says:

    I was working on a ternary search tree based trie this weekend in preparation for unboxing it, I’m curious to see how it fares on your benchmarks. I’ll download them and give i a whirl.

  8. 20/04/2009Jake McArthur says:

    @jberryman:

    No, you aren’t really destroying it conceptually either. Data structures are immutable. All you are doing when you traverse one is binding variables to different parts of the structure over time (usually just updating local pointers into the structure, in implementation).

    I don’t think your radix tree conclusion holds, either, for this reason.

  9. 20/04/2009jberryman says:

    @wren ng thornton:

    Thanks a lot for that explanation. I think I’m much more prepared to look at your library now.

    I guess I don’t have profiling libraries for Data.Binary, but I modified the dictLookup.hs benchmark to read ByteStrings and tested with Test.BenchPress, and the code with bytestring-trie looks to be about twice as fast on my machine.

    I’ve learned a lot from this, and from all the helpful comments. Thanks!

  10. 21/04/2009wren ng thornton says:

    @Edward kmett:

    Any plans to write a post about it? I’m curious what three-way ordering you use. I’ve thought about making bytestring-trie ternary branching since /e/ is the optimal branching factor, but I couldn’t come up with a good way to do the three way split.

    @jberryman:

    Glad to help. (And glad my library holds up ;) If you run into any snags or have patches or requests, I’m more than happy to help.

  11. 21/04/2009Edward Kmett says:

    @wren

    For the ternary trie, all I was using we a left/right split based comparison with the key element or a center leg for when the partial key matches, which when made out of simpler parts rather than flattened and specialized would look like:

    data Tern k a = Tern !Int !k !(Maybe a) !(Tern k a) !(Tern k a) !(Tern k a) | Tip
    data Trie k a = !(Maybe a) :> !(Tern k a)

    that can then be optimized by flattening, specializing, including runs ala PATRICIA using UArr’s, unpacking, even hash consing leaves to obtain a DAWG, etc.

    You’ll note that looks a lot like Data.Map, but you can balance it using either the entire mass of nodes underneath or the number of leaves in the trie, rather than purely using the local weight of the single level of the Data.Map, so it gets a better weighting system. You probably want to set the delta higher than in Data.Map though to do less rebalancing work, especially if you use mass.

    @jberryman

    You can get a 30% speedup on Magnus’ tries by specializing them ala the work we’ve been doing on adaptive-containers:

    data CharIntTrie
    = NothingNothingNode
    | JustNothingNode !Int
    | NothingJustNode !Char !CharIntTrie !CharIntTrie !CharIntTrie
    | JustJustNode !Int !Char !CharIntTrie !CharIntTrie !CharIntTrie
    | NothingSimpleNode !Char !CharIntTrie
    | JustSimpleNode !Int !Char !CharIntTrie
    deriving (Show)

    and a much smaller win can be obtained by doing a worker/wrapper transform on insertWith

    insertWith :: (Int → Int → Int) → String → Int → CharIntTrie → CharIntTrie
    insertWith f s v t = go t s where

    go (NothingJustNode head lesser tail greater) [] = JustJustNode v head lesser tail greater
    go (NothingJustNode head lesser tail greater) aas@(a:as) =
    case compare a head of
    LT → NothingJustNode head (go lesser aas) tail greater
    EQ → NothingJustNode head lesser (go tail as) greater
    GT → NothingJustNode head lesser tail (go greater aas)

    but thats all you’ll get without using some array (UArr/Bytestring) style packing for compaction.

  12. 22/04/2009jberryman says:

    @Edward Kmett:

    Thanks a lot for the feedback. I’m realizing how tricky it is to create a good data structure like this.

    I’ve been a little busy, but want to re-write MagnusTree using some balancing schemes that occurred to me and that I was curious about:

    I realized that when searching for a branch to continue down the Trie, we want to keep the most common branches closer to the root (of this particular node of the Trie). It would be pretty easy to come up with a Trie of weight-balanced tries:

    we could either store an integer at each node and increment it each time we “enter” back into the Trie at that branch, then do one rotation upwards if we see that the weight of our branch is greater than the parent branch.

    an alternative involving more rotations, but which doesn’t need to store weights might be to simply rotate a branch upward every time we match its element and enter it, keeping the common nodes hopefully close to the top.

  13. 22/04/2009Edward kmett says:

    @jberryman

    There has been work on doing just that, its the ‘conditional rotation’ heuristic and it works really well… imperatively.

    http://www.bcs.org/upload/pdf/oommen.pdf

    Unfortunately, there is a cost intrinsic in a functional implementation that isn’t experienced by an imperative implementation of the same idea:

    It involves mutation to update the count functionally, which means you spew out garbage, which has to be collected, and worse, this happens on reads, not writes.

    Consequently, reads which previously didn’t tax the GC at all, now generate garbage. This same problem tends to affect functional splay trees. And to a lesser degree, explains why even rebalancing (which only occurs during writes!) in a mutation-free setting is more expensive than elsewhere, because imperatively you are free to relink the existing tree rather than allocate and relink all the way to the root.

  14. 22/04/2009Edward kmett says:

    As an aside, you can abuse unsafePerformIO to update access counts in references, and hide the whole thing behind a seemingly functional facade, doing rotations next time you mutate that part of the tree. a missed ‘read update’ is harmless so you don’t care too much about racing on the counts. The net result appears functional, it just magically guesses a nicer rebalancing which is invisible to the external facing API.

  15. 7/09/2009TST « Michael Flor’s Blog says:

    [...] Initial tests of Tries: Follow Up [...]

Write a comment: