|
Message-ID: <20110720193218.GA31967@openwall.com> Date: Wed, 20 Jul 2011 23:32:18 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: cryptmd5cuda Lukas - Any comments on the below? I proposed some trivial changes there, which should result in some speedup - quite desirable for uses of your patches in the KoreLogic contest (not necessarily by our team, but by others as well). So please make those changes sooner rather than later. Also, similar changes might apply to other patches of yours - please review them for the same issues and apply the changes as necessary. Have you tested your patches with lengthy cracking runs, on lots of hashes loaded for cracking? For cryptmd5cuda, please use KoreLogic's last year contest's hashes list: http://contest-2010.korelogic.com Compare your results against those produced by a clean JtR build (CPU). Thanks, Alexander On Wed, Jul 13, 2011 at 08:55:41PM +0400, Solar Designer wrote: > I just took a look at john-1.7.7-cryptmd5cuda-2.diff (didn't try running > it yet, though). Here are some comments: > > When convenient, please start to base your patches on the latest main > JtR version, which would be 1.7.8 now (not 1.7.7 anymore). This should > be trivial since your patches are not invasive. > > For 32-bit systems, you recommend the linux-x86-mmx make target, but it > should be linux-x86-sse2 instead. (Of course, this only affects other > hash types, not the one you provide GPU support for.) > > +#define MIN(a,b) (a)<(b)?(a):(b) > +#define MAX(a,b) (a)>(b)?(a):(b) > > I suggest that you add braces around the entire expressions as well, to > avoid any surprises later. > > +static const char md5_salt_prefix[] = "$1$"; > > This reminds me - how about supporting "$apr1$" as well? (You may get > the test vectors from MD5_fmt.c.) > > +#define F(x, y, z) (x&y | ~x&z) > +#define G(x, y, z) (x&z | y&~z) > > These two may be optimized to: > > #define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z)))) > #define G(x, y, z) ((y) ^ ((z) & ((x) ^ (y)))) > > If the GPU doesn't have a single AND-NOT instruction, and I think it > does not, this reduces the instruction count from 4 to 3. > > +#define PLAINTEXT_LENGTH 15+1 > > Why +1? You don't actually support passwords of length 16, do you? > > In md5_block(), you don't take advantage of x[15] always being 0 (for > password length up to 15 inclusive) - you can just substitute 0 there. > Also, rather than maintain an MD5 context, you can load initial > constants right into registers, and you can even do some precomputation. > > ...you also have all those "i % 7" and similar checks, which as we had > discussed you'd need to get rid of. And you wouldn't need md5_digest(), > where you compute the length in bits - all of those different lengths > would be precomputed outside of the 1000 iteration loop. > > Please take a closer look at JtR's MD5_std.c. It has all of these > optimizations I mentioned. It looks like you started with unoptimized > MD5 code and didn't look at JtR's more optimal code closely enough to > spot and borrow these. > > Start by replacing F() and G(), and replacing x[15] with 0. Trivial > changes, but should give some speedup - I'd expect 5% to 10%. > > Thanks, > > Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.