|
Message-ID: <20151210121449.GC23362@port70.net> Date: Thu, 10 Dec 2015 13:14:50 +0100 From: Szabolcs Nagy <nsz@...t70.net> To: musl@...ts.openwall.com Cc: Ed Schouten <ed@...i.nl> Subject: Re: Re: AVL tree: storing balances instead of heights * Ed Schouten <ed@...i.nl> [2015-12-10 10:21:45 +0100]: > > I did some more research and it turns out that AVL trees also allow > you to apply top-down rebalancing instead of bottom-up. The reason for > this is that at each level rebalancing never affects the child along > the path, only its sibling. This approach isn't covered in literature > extensively, but this blog post describes the process quite well: > > http://neil.brown.name/blog/20041124101820 > http://neil.brown.name/blog/20041124141849 > > The downside of such an approach is that the tree needs to be > traversed forward twice, meaning that a naïve implementation would > need to perform a larger number of comparisons. This can be mitigated > by storing the path during the first traversal. As the number of nodes > of the tree can never exceed the number of bytes in the address space, > we know that two uintptr_t's provide enough bits to store any path. > > I've done some measurements and such an implementation only seems to > be 7% slower for insertions and 4% slower for deletions when compared > to my previous recursive implementation. The advantage is that > recursion is eliminated. Memory use is constant, regardless of the > depth of the tree. > eliminating recursion is not that important. you can turn depth*stackframe into depth bits + 1 stackframe stack usage, but avl tree is very balanced so depth is small (e.g. < 60 with 47bit address space) there are plenty libc apis that use more stack than that, so i think the optimizations only make sense if the code size remains small (or if you provide a non-standard api that's reasonable and use it somewhere). > I also benchmarked an iterative bottom-up implementation. This version > was about 25% faster for deletions, but had the downside of requiring > an array of parent pointers on the stack (~750 bytes). > > As usual, here are links to my source files: > > https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/tsearch.c > https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/tdelete.c > https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/search_impl.h > > And a link to a spreadsheet with some benchmark results, where I > measure the amount of time and number of comparisons of 10 million > insertions and deletions of keys in ascending order: > > https://docs.google.com/spreadsheets/d/1Uy1Kgz9SGf_szH3K4FkyxxioHmBlRu9LsQjDqkBw8v8/edit#gid=0 > can you add code size column as well? (sum of size of x86_64 pic .text+.data+.bss) i assume freebsd does not balance the tree so it timeouts (which also shows why this api is not useful in practice) glibc uses red-black trees which do less balancing operations so insert/delete may be faster, but lookup is slightly slower. (but it seems it does more comparisions and that can be slow) i assume the musl code there already has the tdelete balancing patch applied and it is compiled with -Os. performance also depends on the allocator since insert/delete has to malloc/free (yet another issue with the api) musl's allocator is i think still better for realtime systems than the jemalloc used in cloudlibc, but it will have worse average performance in benchmarks like this. thanks.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.