Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151210121449.GC23362@port70.net>
Date: Thu, 10 Dec 2015 13:14:50 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Cc: Ed Schouten <ed@...i.nl>
Subject: Re: Re: AVL tree: storing balances instead of heights

* Ed Schouten <ed@...i.nl> [2015-12-10 10:21:45 +0100]:
> 
> I did some more research and it turns out that AVL trees also allow
> you to apply top-down rebalancing instead of bottom-up. The reason for
> this is that at each level rebalancing never affects the child along
> the path, only its sibling. This approach isn't covered in literature
> extensively, but this blog post describes the process quite well:
> 
> http://neil.brown.name/blog/20041124101820
> http://neil.brown.name/blog/20041124141849
> 
> The downside of such an approach is that the tree needs to be
> traversed forward twice, meaning that a naïve implementation would
> need to perform a larger number of comparisons. This can be mitigated
> by storing the path during the first traversal. As the number of nodes
> of the tree can never exceed the number of bytes in the address space,
> we know that two uintptr_t's provide enough bits to store any path.
> 
> I've done some measurements and such an implementation only seems to
> be 7% slower for insertions and 4% slower for deletions when compared
> to my previous recursive implementation. The advantage is that
> recursion is eliminated. Memory use is constant, regardless of the
> depth of the tree.
> 

eliminating recursion is not that important.

you can turn depth*stackframe into depth bits + 1 stackframe
stack usage, but avl tree is very balanced so depth is small
(e.g. < 60 with 47bit address space)

there are plenty libc apis that use more stack than that, so
i think the optimizations only make sense if the code size
remains small (or if you provide a non-standard api that's
reasonable and use it somewhere).

> I also benchmarked an iterative bottom-up implementation. This version
> was about 25% faster for deletions, but had the downside of requiring
> an array of parent pointers on the stack (~750 bytes).
> 
> As usual, here are links to my source files:
> 
> https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/tsearch.c
> https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/tdelete.c
> https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/search_impl.h
> 
> And a link to a spreadsheet with some benchmark results, where I
> measure the amount of time and number of comparisons of 10 million
> insertions and deletions of keys in ascending order:
> 
> https://docs.google.com/spreadsheets/d/1Uy1Kgz9SGf_szH3K4FkyxxioHmBlRu9LsQjDqkBw8v8/edit#gid=0
> 

can you add code size column as well?
(sum of size of x86_64 pic .text+.data+.bss)

i assume freebsd does not balance the tree so it timeouts
(which also shows why this api is not useful in practice)

glibc uses red-black trees which do less balancing operations
so insert/delete may be faster, but lookup is slightly slower.
(but it seems it does more comparisions and that can be slow)

i assume the musl code there already has the tdelete balancing
patch applied and it is compiled with -Os.

performance also depends on the allocator since insert/delete
has to malloc/free (yet another issue with the api) musl's
allocator is i think still better for realtime systems than
the jemalloc used in cloudlibc, but it will have worse average
performance in benchmarks like this.

thanks.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.