Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABh_MKmvtxKsMjdzNrqrXQ2VNZ8PMsN=iNd1NRhhrMC6PrYsug@mail.gmail.com>
Date: Thu, 10 Dec 2015 10:21:45 +0100
From: Ed Schouten <ed@...i.nl>
To: musl@...ts.openwall.com, Ed Schouten <ed@...i.nl>
Subject: Re: Re: AVL tree: storing balances instead of heights

Hi there,

2015-12-07 15:46 GMT+01:00 Szabolcs Nagy <nsz@...t70.net>:
> in general it is not guaranteed that void* or void**
> is represented the same way as other pointers, the
> guarantee is that all object pointers can be converted
> to/from void* and may be possible to convert to/from
> other object pointers, but in either case you have to
> convert the pointer back to the original type to get
> a meaningful value that can be dereferenced.
> http://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7

Ah, thanks for clarifying. I didn't know that.

I did some more research and it turns out that AVL trees also allow
you to apply top-down rebalancing instead of bottom-up. The reason for
this is that at each level rebalancing never affects the child along
the path, only its sibling. This approach isn't covered in literature
extensively, but this blog post describes the process quite well:

http://neil.brown.name/blog/20041124101820
http://neil.brown.name/blog/20041124141849

The downside of such an approach is that the tree needs to be
traversed forward twice, meaning that a naïve implementation would
need to perform a larger number of comparisons. This can be mitigated
by storing the path during the first traversal. As the number of nodes
of the tree can never exceed the number of bytes in the address space,
we know that two uintptr_t's provide enough bits to store any path.

I've done some measurements and such an implementation only seems to
be 7% slower for insertions and 4% slower for deletions when compared
to my previous recursive implementation. The advantage is that
recursion is eliminated. Memory use is constant, regardless of the
depth of the tree.

I also benchmarked an iterative bottom-up implementation. This version
was about 25% faster for deletions, but had the downside of requiring
an array of parent pointers on the stack (~750 bytes).

As usual, here are links to my source files:

https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/tsearch.c
https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/tdelete.c
https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/search_impl.h

And a link to a spreadsheet with some benchmark results, where I
measure the amount of time and number of comparisons of 10 million
insertions and deletions of keys in ascending order:

https://docs.google.com/spreadsheets/d/1Uy1Kgz9SGf_szH3K4FkyxxioHmBlRu9LsQjDqkBw8v8/edit#gid=0

Best regards,
-- 
Ed Schouten <ed@...i.nl>
Nuxi, 's-Hertogenbosch, the Netherlands
KvK-nr.: 62051717

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.