john-dev - Re: cryptmd5cuda

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110802054809.GB27850@openwall.com>
Date: Tue, 2 Aug 2011 09:48:09 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: cryptmd5cuda

On Mon, Jul 25, 2011 at 09:50:31AM +0200, ?ukasz Odzioba wrote:
> I found bug in cryptmd5cuda patch rev2.
> The problem was in this functions:
> 
> static void *salt(char *ciphertext) - it was returning prefix and salt.
> Everything have been ok during test because address returned by salt
> was the same that was  set_salt() parameter.
> During "normal cracking" salt has been shortend to SALT_SIZE, and
> copied to another place in memory.
> 
> I have fixed that, and it works as it should.

Thank you for explaining this.  I am relieved to know it was this simple.

> I have also:
> -added $apr1$ support

Great!

> -tested your proposed F and G functions (there are around %1 slower, I
> have already tried them earlier,i suppose that it may be faster for
> AMD's)

This is puzzling.  As far as I'm aware, they're supposed to be faster on
NVidia as well.  How did this change affect code size?  Did the code
size reduce with the fewer-ops F and G functions?

Perhaps you have some other bottleneck that you're hitting.

> -tested x[15]=0 situation +1.5%, thanks!
> 
> After some other changes I've got +11% comparing to rev2.

Sounds good.

> I still have got this "i%7" problem, but will dig more into MD5_std.c.
> and try to figure out how to do it on gpu (with memory limitations).
> 
> I've done changes you proposed earlier (MIN macros,sse2 bulid).
> Other patches seems to work properly.
> Cryptmd5-cuda patch rev 3 for john 1.7.8 is on wiki.
> I've tested it on corelogic 2010 $1$ hashes and got:
> cpu (1 core of i3-2100):
> guesses: 583  time: 0:00:25:18 100%  c/s: 9011  trying: hallo

This was probably a 32-bit build.  A 64-bit build would be 50% faster
or so.  A build with the -jumbo patch and very recent gcc or with icc
would be faster yet (up to around 30000 c/s per core).

> gpu:
> guesses: 583  time: 0:00:02:10 100%  c/s: 113948  trying: 12345fgh - hallo

That's OK, but it's 3 times slower than the benchmark you have posted on
the wiki, right?  So is there a 3x performance hit for actual cracking
as compared to --test?

100k c/s is probably achievable on your i3-2100 CPU if you do a more
optimal build (see above) and use all CPU cores at once.  I am not
asking you to do that; I merely point out that the speedup with a GPU is
still questionable, unfortunately.

Thanks,

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.