|
Message-ID: <cbd5e71997ec1c19d7b28fae2c68f242@smtp.hushmail.com> Date: Wed, 19 Apr 2023 23:47:56 +0200 From: magnum <magnumripper@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Birthday paradox On 2023-04-18 23:12, Solar Designer wrote: > On Tue, Apr 18, 2023 at 10:31:04PM +0200, magnum wrote: >> The BITMAP_SHIFT macro and the occasional multiplication are all >> compile-time constants so should be optimized away. Then each bitmap >> step is just an add, a couple shifts and a couple ANDs. > > So BITMAP_SHIFT makes this separate 8 bitmaps. You could instead try > removing the shifts and having each lookup cover the entire array > (larger mask). That would be a Bloom filter. (...) You might very > well achieve better results than we have now. I read up on Bloom filters and tried this - indeed very trivial changes make it a traditional k-level Bloom filter. The performance (actual FP's counted by kernel) almost didn't change at all (tried both with sparsely and heavily populated bitmaps) and where it ever so slightly did, it was sometimes in favor of the original code and sometimes of the modified one. My conclusion is they are very near equivalent, even more so than I presumed. > You could then experiment with lookup count, as it would be untangled > from the size. With my debug/research patches I could already experiment freely with sizes and lookup counts so this didn't really change anything. Still, I'm leaning towards pushing those bitmap changes. If nothing else, we'll know what to call our bitmaps from then on :) >> I tried disabling the bitmap completely, to see >> how bad it would be, and speed plummeted. I've been meaning to re-try >> that but with the offset and hash tables in local memory (when they fit >> of course). But if I recall correctly, I even tried replacing the bitmap >> lookup (for a single hash loaded) with a constant compare [if (hash[1] = >> 0xcafebabe)] and even *that* did not run any faster than the current >> bitmap code. Not sure why really. I now added code that checks if the offset + hash tables fit into local memory and if so, skips the bitmap completely. This happens up to about 5300 loaded hashes. But just like most things I tried, this performs just the same as using the bitmap - neither better or worse. > Maybe performance is dominated by random global memory > access throughput (as in random lookups per second). But our reads from global (as in the key buffer) should be amortized well by the mask acceleration. Perhaps I should somehow re-verify that the MD4 is actually calculated all with registers but I'd be a bit surprised if it's not. Maybe I should retry much of the above on AMD, and see if the results are different. Heck, I could even try it on this very macbook I'm sitting at... I use it almost exclusively as a very expensive terminal emulator, lol. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.