|
Message-ID: <20230422213943.GA16507@openwall.com> Date: Sat, 22 Apr 2023 23:39:44 +0200 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Birthday paradox (Bloom filter, Perfect hash tables) On Thu, Apr 20, 2023 at 02:15:48PM +0200, magnum wrote: > I kind of regret having this conversation on the ML instead of Github - > the latter is way more convenient. We had a chance of having more people comment in here, but somehow that didn't happen. > I found that Sayantan's perfect hash tables involves randomness - they > end up with slightly different sizes. A Mersenne Twister is seeded from > gettimeofday(). > > Perhaps differences in speed between runs could be caused by different > modulo figures ending up differently optimized? Or should I expect that > such optimizations are more or less uniform given that OFFSET_TABLE_SIZE > is (assumably) always a prime? I'd expect them to be uniform for nearby values that are not powers of 2. > I suppose we could give it a static seed for uniform test runs. Perhaps > we could even want to commit a static seed? I assume we'd still need > the varying seed for when it has to retry a table build though. I guess we should be able to use a static seed - if it's literally just a seed, and when retrying further random numbers would be used computed from that seed. > Because of mentioned randomness, 5308 hashes *sometimes* fit in local > memory without a bitmap, sometimes not. But at that number, using local > or not almost doesn't affect the speed I get! Perhaps the freed local > memory becomes cache (or caching is otherwise very good) - I recently > heard someone mentioning that about modern GPU's. L1d caches are of similar size to local memory, may be of similar speed too, and if there are no global memory accesses they're not thrashed. However, there may be differences in gather load support - I'd expect it to favor local memory at least on AMD. When you make those tiny random lookups concurrently from different SIMT "threads", it's essentially a gather load instruction if supported for the memory type. By the way, FP rates like 1/46 might not be good enough if any _one_ of 64 "threads" not rejecting during those early lookups means that execution has to proceed to the global memory hash table access, and all will wait for it to complete. Apparently on newer AMD RDNA this is reduced from 64 to 32 - that's an improvement. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.