|
Message-ID: <20230418194432.GA4192@openwall.com> Date: Tue, 18 Apr 2023 21:44:32 +0200 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Birthday paradox On Tue, Apr 18, 2023 at 07:29:36PM +0200, Solar Designer wrote: > On Tue, Apr 18, 2023 at 07:12:09PM +0200, magnum wrote: > > Background: I was amazed how an 4-level bitmap that "should" give 1/16 > > false positives, could end up as good as 1/43. That was with 512 hashes > > in a 4x1024 bitmap. Using this formula we end up with 1/41, so mystery > > solved. > > > > In another test case, 1024 hashes in a 8x1024 bitmap would be "full" and > > should give 1/1 false positives if we don't factor in the birthday > > paradox. But empirical tests showed an amazing 1/38 - actually good > > enough to make an nvidia perform near its best! This was again > > explained with the birthday paradox (which says 1/39). > > Now you need to compare efficiency of these multiple bitmaps against one > Bloom filter of their cumulative size. One Bloom filter should have lower false positive rate per size than multiple bitmaps do, however it may require more lookups to achieve such lower FP rate. Typical formulas for optimal Bloom filter parameters optimize solely for lowest FP rate, ignoring the cost of greater number of lookups. > And then against a Cuckoo filter of the same size. This is the best of both worlds: not only even lower FP rate per size than Bloom filter, but also at most 2 lookups. However, perfect hash tables, which we already use as the final level in the code that you refer to, are an even further step in the same direction, needing exactly 1 lookup. Given that we already went for the complexity there, is there a reason we're not doing the same for the smaller early-reject tables in local memory? Just fill them with tiny fingerprints like a Cuckoo filter would use. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.