|
Message-ID: <20150913140456.GA1718@openwall.com> Date: Sun, 13 Sep 2015 17:04:56 +0300 From: Aleksey Cherepanov <lyosha@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: experiment with functions to reject computed hashes On Sun, Sep 13, 2015 at 04:06:20PM +0300, Solar Designer wrote: > On Sun, Sep 13, 2015 at 03:34:09PM +0300, Aleksey Cherepanov wrote: > > I was curious if it is possible to make a function to reject computed > > hashes useless for us. Such function has to: > > - never reject hashes from given set, > > - reject as many as possible other hashes. > > We're already doing that with bitmaps and hash tables, and just > yesterday I posted on other related ideas. How is your idea different? > Do you mean rejecting before hash computation is finished? Even if so, > we already have the crypt_all()/get_hash*()/cmp_all()/cmp_one() vs. > cmp_exact() split. Do you mean checks that are not effective enough > (don't eliminate enough candidates) to postpone them until after > crypt_all() returns, so that you want to do them inside crypt_all() > before proceeding to more effective (yet also non-final) checks? The tables are aimed to cut down some hashes before bitmap. They are not aimed to replace bitmap. I had this idea earlier, so thread about Judy arrays just triggered experiments. > > So generated function can reject almost half of candidates using ~7 > > sse instructions (not counting load of tables but counting final > > &0x0..0ff), cracking 32 hashes. > > Isn't it more effective to go for bitmap lookups right away? Could not it be faster with higher number of hashes? I guess size of vectors matter much more. > If cracking only 32 hashes, we can have the bitmap small enough that > it's in L1 cache yet is very effective. Of course, it'd be 4 lookups if > cracking 4 hashes per 128-bit SIMD vector. Is your "7 instructions" for > all 4? Even if so, it's not 7 vs. however many we'd need for bitmap > lookups, but rather it's 7 plus at least 50% of the effort that we > since your filter is less than 50% effective as you say. "7 instructions" is for the whole vector, not for each value in vector. But probability is for separate values. Having 50% probability and 4x vector, full reject would occur only in 1/16 cases. 85% and 4x vectors would give full rejection in 50% cases. So with 85% and 4x vector, it would be 1 check in 50% and 1 check + 4 lookups in other 50%. Trickier code might separate probabilities. So I think there are a few cases when the tables might be useful: - if reject ratio is close to precise, to replace bitmaps, - if lookups are very slow, - if check cost + (1 - reject probability) * vector size * lookup cost < vector size * lookup cost (and that's all without respect to time of building tables.) > And you say you're "not counting load of tables" (constants into SIMD > registers?), which is going to cost too (we usually don't have spare > registers across the entire hash computation). The tables are not really constants, they should be computed at moment when all hashes are loaded. I guess it would be 1 movdqa per table. > I think most speedup when cracking few hashes is possible through > reversing of steps, which we're not doing yet. Until we do that, it's > premature to optimize hash comparisons for that same special case. My tables does not interfere with reversing (like bitmap do not interfere). I think it is a separate question. But faster hashes would benefit from this mostly because slower hashes spend most time computing (and they are probably salted, so checks against sets of 1 hash are more expectable). So yes, there are other, more promising ways to improve fast hashes. Thanks! -- Regards, Aleksey Cherepanov
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.