|
Message-ID: <20130721163159.GA10862@openwall.com> Date: Sun, 21 Jul 2013 20:31:59 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: md5 hash comparisons On Sun, Jul 21, 2013 at 10:22:31AM +0530, Sayantan Datta wrote: > Currently there are 2x2KB bitmaps Why did you choose these specific numbers? e.g. myrice chose 4x 8 KB instead. Are you trying to support devices where we can't afford 32 KB? > Regarding bitmap in global memory, even > though it allows us to uniquely identify all hashes using 4*512MB bitmaps Does it? I guess you did not mean it literally, since all of those things are probabilistic. We can only have a very good chance, not certainty. > but one major drawback with bitmaps is that it turns sequential search into > random access which will most likely cause bank conflicts and lower > performance. I tried one 256MB bitmap in the md5 kernel but the performance > wasn't good. For large number of hashes the hashtable approach might be > better. My understanding is that any global memory access involves loading an entire cache line. So, yes, accessing just one bit is highly non-optimal: we need 1 bit, but we get e.g. 64 bytes read into the chip instead. However, we can't really do better (if we check just one computed hash value and we've already made optimal use of local memory if we could): even if with some other algorithm we do need those e.g. 64 bytes, we still incur the same performance hit of one global memory access. I'd expect significant performance difference if the (average) number of random accesses differs between algorithms. If our one bitmap lookup lets us reject the computed hash value almost all of the time, there's not much to be gained by using another data structure and algorithm. However, if there is a significant percentage of cases where further checks are needed, then another data structure and algorithm may in fact help. What exactly do you mean by "performance wasn't good" - specific speed numbers for this approach and for whatever you're comparing it against? Here's a crazy idea: Can we somehow check multiple computed hashes with one global memory lookup? e.g. if we could buffer computed hashes and group them by their required index into a global memory bitmap (or hash table or whatever), then eventually we could have two or more bits to check within the same cache line (to be read from global memory at that time). The buffering even to global memory could be pretty fast if it's sequential, however in that case we would not be able to easily group the computed hashes like we need to, and reading the relevant hashes back from global memory would defeat the purpose of us reducing the number of global memory bitmap lookups. So we'd have to buffer/group in local memory, which would likely make this impractical for most settings (there's not enough local memory to achieve such grouping for reasonably large global memory bitmaps). Can you simply take a look at and experiment with Multiforcer, with large loaded hash counts? According to Bitweasil, it handles this well. Here's another thought: While we need to support these high loaded hash counts efficiently, we're unlikely to see them in the upcoming contest. ;-) In this context, it'd make more sense for you to spend the following week focusing on making mask mode available and work reliably and with proper speed reporting, for more of the formats - and maybe also in combination with other cracking modes (especially wordlist). Besides descrypt and raw MD5, it'd be nice to also have it for NTLM, MSCash (DCC), raw SHA-1, and raw MD4. Also, it'd be nice if your mask mode addition doesn't break any of the existing JtR functionality - I think the way you hacked it in currently does break other features. Can you bring your mask mode on GPU stuff to a practically usable form before the contest? Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.