|
Message-ID: <20150918104952.GA22858@openwall.com> Date: Fri, 18 Sep 2015 13:49:52 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: larger bitmaps and hash tables magnum, Fred - On Wed, Sep 16, 2015 at 09:26:24PM +0200, magnum wrote: > On 2015-09-16 19:42, Solar Designer wrote: > >I think we'll be changing all hash table sizes at once, not just the > >6th. So maybe you can prepare us for that? > > Done in 36393e763. I did not actually change any size yet. This was > significantly harder to do with pure one-liner-fu. It may have missed > something or included some spurious 0xff that was NOT a hash mask. I did > review all of it twice and did some machine checks as well so I'm pretty > sure it's OK. Thanks. > However: > * Trip format was not touched. I'm not sure what to do with it. It's not just trip, but all DES stuff coming from core. Please leave these as-is for now. > * ntlmv1_mschapv2_fmt_plug.c and some other formats didn't have bits > enough even for our old sizes. A problem is once we change params.h > sizes, we can't immediately see which formats need tweaking (for > example, ntlmv1 and mschapv2 would probably need one or two more > functions NULL'ed). I'm not sure what happens if we use a size of 20 > bits and the function actually only delivers 16... but I guess there > won't be any problem except some excess memory use? Right. I thought of asking Kai to add a statistical test to --test-full that would detect this, but I didn't yet. > >I think we'll obsolete the smallest two sizes, shifting everything else > >by two sizes. The new smallest size will be a 4096-bit entry bitmap and > >a 1024-entry hash table (with the current SHR=2), which is 8.5 KB total > >on a 64-bit system (so is still in L1 cache). > > > >This frees up the highest two sizes for reuse, and I think we may make > >them 30 and 32 bits. The latter would exist only in 64-bit builds > >unless we increase SHR (maybe we should introduce "magic SHR" on 32-bit, > >where an extra 2 or 3 bits would be skipped for the largest sizes). > > > >So we'll have: 12, 16, 20, 24, 27, 30, 32. > > How are we going to define the actual PASSWORD_HASH_SIZE_6? Isn't it > still an int? Can we just define it as ULL with no problems? You're right that we'll have a problem with it, and would need to make other changes first. > >The binary_hash_*() functions will need to be merged into one, accepting > >the size in bits (or mask?) The get_hash_*() functions will stay (but > >be revised to support these larger sizes). > > Isn't it faster/simpler to just have it always deliver 32 bits and then > mask them off at the caller? Maybe a few formats would suffer? Not many > I'm sure. Yes, I am now leaning towards providing such a function as well, albeit mostly for reasons not yet discussed in here: Right now, the hash table index uses a subset of bits that had been used to index the bitmap. This is wasteful in the sense that if there was a bitmap hit, there's always a hash table hit. It saves a NULL pointer check, but it loses the opportunity for a second chance at rejecting the password before we proceed with cmp_one(). This just happened so historically when bitmaps were introduced after we already had hash tables. With a get_hash() that delivers the full 32 bits all the time, and does so fast (so not for a bitsliced format), we could have enough bits to minimize the overlap between the two indices and thus reduce the frequency of cmp_one() calls (and maybe more importantly of needing to load the binary and next_hash from memory). [ A drawback, though, is that with the hash table index bits no longer being a strict subset of the bitmap index bits, we would no longer be able to simply skip the hash table update when removing the last element from a bucket (like we do with my recent copy-on-write minimizing patch). That's because the same hash bucket would then be potentially correspond to disjoint portions of the bitmap. We could check those portions to see if they're possibly all-zero and then we'd bypass the hash table update, but this won't always happen to be the case, and it gets tricky. ] Also, I realized that we might want to keep the 2nd smallest size for now since it makes sense to use tiny bitmaps and hash tables when cracking a lot of salted hashes with a few hashes per salt (more than one per salt, but not many). For example, 100 million hashes with 30 million different salts. This is an obscure special case, but when it does occur, 8.5 KB per salt would be potentially problematic (would needlessly spend 255 GB in this example). Arguably, the threshold for using bitmaps at all could be higher than 3, but somehow 3 was optimal in my tests when I first introduced bitmaps. So an alternative plan is to replace just get_hash_0() with one returning the full 32 bits. Maybe it needs to be outside of the array of functions, and called just get_hash(). If defined, it would need to be used instead of the per-size functions. If not defined, the per-size functions should be used. Maybe it should be treated as an error (self-test failure) if both kinds are defined for a format. Since this plan would reduce the number of functions in the array by one, we may as well start by shifting them by one size and introducing 30-bit for the new get_hash_6(). Then when implementing the generic 32-bit get_hash() and its uses in code, we'd drop this get_hash_6(). I would also like to give Fred a chance to contribute Judy before we would have invested much time into a hash table rework. So adjusting the bitmap sizes is fine to be done now, but changing how the bitmap and hash table interact might be premature. I suspect that having a full 32-bit partial hash easily available would be beneficial for possible alternatives to hash tables as well. Fred, any comments on this? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.