|
Message-ID: <20230425151552.GA29035@openwall.com> Date: Tue, 25 Apr 2023 17:15:52 +0200 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Birthday paradox On Tue, Apr 25, 2023 at 04:24:19PM +0200, magnum wrote: > On 2023-04-23 13:08, Solar Designer wrote: > >I think we should switch to using floating-point inside that function so > >that we do not have to explicitly lower precision (...) > This (fixed now in bleeding) helped a lot. Here's a few data points > (this all on nvidia 2080ti): Great. > - Probing the table directly is faster than, or about as fast as, using > the bitmap - up to about 500 hashes. For a single (1) hash, it's 4-5% > faster than best seen with bitmap - even when the tables are in global > memory (nearly same performance, caches doing their job). My expectation, from bcrypt performance, is that you'll find a greater difference between memory types on AMD. > At about 5K hashes, probing directly gets significantly slower than the > bitmap, when the tables no longer fit in local memory. Above that, the > bitmap clearly wins, even where the bitmap no longer fits in local and > even with poor parameter choices (high FP rates). > > Even in the upper end of the spectrum (10M hashes in this case), the > tables were largely outperformed by nearly any bitmap, even with 1/15 fp > rate. I'm a bit surprised by this, as probing directly is less work and > fewer loads from global - the cache situation should be horrible in both > cases. Yes, that's surprising. > - I've seen at least one case where k=3 is much better than the old > bitmaps: At 1.5M hashes I got 10891Mp/s with m=16777216 and k=3 whereas > bleeding-jumbo produced 6686Mp/s using m=8388608 and k=1. > k=5 or 6 was tried but not better than others. k=7 or 8 was never picked > by my (varying) algorithms so far. > > - I have yet to see the old bitmap code outperform the new in any > discouraging way. It sometimes wins, but not by far and I'm nowhere > near finishing my "algo". Endless testing ahead. Cool. Meanwhile, I recalled a drawback of Bloom filter vs. separate bitmaps: you can't remove items from a Bloom filter on its own (without host memory overhead to store extra info to allow for that, or regenerating). I guess the current code didn't bother removing from the separate bitmaps anyway? Ideally, we'd enhance the code to do that so that c/s and p/s (but of course not C/s) would improve as hashes get cracked and removed. In the single lookup bitmaps that we use on host in cracker.c for CPU formats, we do remove cracked hashes from there. Given that our Bloom filters are intended for local memory and are thus tiny in terms of host memory, you can easily also store them as separate bitmaps on host to allow for efficient item removal. Just update the affected elements in the individual bitmaps, then regenerate the corresponding elements in the Bloom filter (not the full filter) from the bitmaps. You could also want to experiment with smaller hashes aka fingerprints stored in the perfect hash tables in local memory, so that more would fit. This is basically how cuckoo filter differs from cuckoo hashing. And you can easily and quickly remove items from there (leaving a hole, so it's not as optimal as a smaller new table/filter could be, but is better than not removing). By smaller hashes to use as fingerprints, I mean even down to 8-bit or less if you also use the full tables in global memory as a second step. With a typical (4, 2) cuckoo filter, 8-bit would mean about 1/32, which is comparable to what you say you're often getting with a Bloom filter now. With perfect hash tables, it should be better yet - even as good as about 1/256 maybe? I write "about" because you also need to mark/handle empty slots, which in the trivial implementation wastes one magic fingerprint value.(*) When a truncated hash happens to match this value, you have to store a different value as the fingerprint instead, and that different value unfortunately becomes twice more frequent than the rest, which lowers efficiency slightly. Another reason for "about" is that a cuckoo filter wouldn't be 100% full - optimal is 98% full for (4, 2) - which on one hand lowers the FP rate, but on the other means there's size overhead. (*) Something I managed to avoid in pwqfilter with much more complex trickery that is not applicable for tiny early-reject filters that we need here. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.