|
Message-ID: <11fe905f23295722355f4cb78f402924@smtp.hushmail.com> Date: Thu, 31 Jan 2013 18:36:22 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: NetNTLMv1 On 31 Jan, 2013, at 10:37 , Solar Designer <solar@...nwall.com> wrote: > Attached is quick and still dirty implementation of the above approach > for JtR. Compared to the approach with maintaining a lookup table per > challenge, this has lower memory needs and higher cracking speed, but > (as currently implemented) it does the ~32k DES computations per C/R > pair rather than per challenge. It is possible to improve it to only do > those computations per challenge, by temporarily maintaining a lookup > table for each challenge (during loading only, and maybe only for the > current challenge). > > New speeds: > > Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... DONE > Many salts: 882291K c/s real, 882291K c/s virtual > Only one salt: 7647K c/s real, 7647K c/s virtual > > Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... (8xOMP) DONE > Many salts: 910901K c/s real, 114005K c/s virtual > Only one salt: 13025K c/s real, 1626K c/s virtual > > Alexander This is now committed, as well as SIMD support. New Bull figures: Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [128/128 XOP intrinsics 8x]... DONE Many salts: 315806K c/s real, 315806K c/s virtual Only one salt: 28196K c/s real, 28196K c/s virtual That's a poor many-salts figure, see below. But the non-SIMD OMP speed got a lot worse: Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... DONE Many salts: 880116K c/s real, 887389K c/s virtual Only one salt: 7557K c/s real, 7557K c/s virtual Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... (8xOMP) DONE Many salts: 223689K c/s real, 39416K c/s virtual Only one salt: 904681 c/s real, 159323 c/s virtual This performance regression must be caused by my tweaking of OMP_SCALE and base MAX_KEYS_PER_CRYPT for an i7-3820. Benchmarks for that one at 3.60GHz: Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [128/128 AVX intrinsics 12x]... DONE Many salts: 914527K c/s real, 917585K c/s virtual Only one salt: 44938K c/s real, 44938K c/s virtual Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... DONE Many salts: 1251M c/s real, 1255M c/s virtual Only one salt: 9564K c/s real, 9564K c/s virtual Note that the below is *four* cores, not eight. Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... (4xOMP) DONE Many salts: 1734M c/s real, 434310K c/s virtual Only one salt: 28072K c/s real, 7035K c/s virtual That's a remarkable difference. Not sure how to tweak it for both. Maybe just lower MAX_KEYS_PER_CRYPT until we see a sweet spot on Bull, and hope it does not make too much difference on the Intel. Oh btw, here's a figure for Bull at four cores: Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... (4xOMP) DONE Many salts: 976125K c/s real, 244587K c/s virtual Only one salt: 14336K c/s real, 3591K c/s virtual That's better. Does this suggest any specific change I should do? I haven't even tried doing OMP for SIMD. I reckon it can't do any good with current core. The NT2 format has the code needed, but it's defined out (BLOCK_LOOPS). Maybe it can be tweaked to do a little good. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.