Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110713165541.GB24742@openwall.com>
Date: Wed, 13 Jul 2011 20:55:41 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: cryptmd5cuda

Lukas,

I just took a look at john-1.7.7-cryptmd5cuda-2.diff (didn't try running
it yet, though).  Here are some comments:

When convenient, please start to base your patches on the latest main
JtR version, which would be 1.7.8 now (not 1.7.7 anymore).  This should
be trivial since your patches are not invasive.

For 32-bit systems, you recommend the linux-x86-mmx make target, but it
should be linux-x86-sse2 instead.  (Of course, this only affects other
hash types, not the one you provide GPU support for.)

+#define MIN(a,b) (a)<(b)?(a):(b)
+#define MAX(a,b) (a)>(b)?(a):(b)

I suggest that you add braces around the entire expressions as well, to
avoid any surprises later.

+static const char md5_salt_prefix[] = "$1$";

This reminds me - how about supporting "$apr1$" as well?  (You may get
the test vectors from MD5_fmt.c.)

+#define F(x, y, z) (x&y | ~x&z)
+#define G(x, y, z) (x&z | y&~z)

These two may be optimized to:

#define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z))))
#define G(x, y, z) ((y) ^ ((z) & ((x) ^ (y))))

If the GPU doesn't have a single AND-NOT instruction, and I think it
does not, this reduces the instruction count from 4 to 3.

+#define PLAINTEXT_LENGTH       15+1

Why +1?  You don't actually support passwords of length 16, do you?

In md5_block(), you don't take advantage of x[15] always being 0 (for
password length up to 15 inclusive) - you can just substitute 0 there.
Also, rather than maintain an MD5 context, you can load initial
constants right into registers, and you can even do some precomputation.

...you also have all those "i % 7" and similar checks, which as we had
discussed you'd need to get rid of.  And you wouldn't need md5_digest(),
where you compute the length in bits - all of those different lengths
would be precomputed outside of the 1000 iteration loop.

Please take a closer look at JtR's MD5_std.c.  It has all of these
optimizations I mentioned.  It looks like you started with unoptimized
MD5 code and didn't look at JtR's more optimal code closely enough to
spot and borrow these.

Start by replacing F() and G(), and replacing x[15] with 0.  Trivial
changes, but should give some speedup - I'd expect 5% to 10%.

Thanks,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.