|
Message-ID: <d99bc9d39b8a8a45b418326d5f43dccc@smtp.hushmail.com> Date: Wed, 6 Mar 2013 02:36:31 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: NetNTLMv1 and MSCHAPv2 Solar, On 7 Feb, 2013, at 5:02 , Solar Designer <solar@...nwall.com> wrote: > On Thu, Feb 07, 2013 at 07:56:01AM +0400, Solar Designer wrote: >> As to speeding up NetNTLMv1 and MSCHAPv2 some more, if we care, we may >> want to split the crypt_key[] array in two, one for 2-byte "hot" >> portions of the output and the other for 14-byte "cold" portions (may >> expand them to 16 bytes for faster index to offset calculation). >> Allocating 21 or 22 bytes wastes cache space. We only use 21 in >> cmp_exact(), for one hash at a time - we could use a local variable >> there instead. I am not going to work on this now. You may. :-) The SIMD code path already separates nthash[] from crypt_key[] just for the sake of postponing stuff until needed. That is, we only copy the 2 hot bytes and then don't touch nthash[] until we reach cmp_one() [a.k.a thorough part of cmp_all()]. We currently do not take the opportunity to reduce size of crypt_key[] - the latter could be just the 2 hot bytes. I just tried this but it makes no difference on its own (actually slightly slower - wtf?). > Taking this a step further, we could store just a few bytes of the > 14-byte portion, and recompute the rest of the NT hash in cmp_exact() > when we have to. This might do more good for scalar code path than for SIMD. OTOH maybe it could make SIMD scale better in OMP? > ... and if we store e.g. just last 4 bytes of each NT hash, then we can > get away with using just one array, like we do now. 2 bytes of each > element will be directly compared to the known values for the 3rd DES > block key, and 2 bytes before those last 2 will be compared to results > of DES encryption of the 2nd block, which are computed as needed. > I think I like this option best. Separating the hot/cold arrays has > some cost of its own, and we can avoid that by simply making each > element this small (only 4 bytes). The index to offset calculation > becomes trivial (and supported in x86's addressing modes natively). Maybe I'm thick now but I don't follow. cmp_all() checks the known last two bytes of the NT hash. If they match (once in 64K), we calculate the first DES block and compare that. We need the first 7 bytes of the NT hash for this so 2+7 bytes would make it until cmp_exact(). How would we use just 2+2 bytes? magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.