|
Message-ID: <20110802054809.GB27850@openwall.com> Date: Tue, 2 Aug 2011 09:48:09 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: cryptmd5cuda On Mon, Jul 25, 2011 at 09:50:31AM +0200, ?ukasz Odzioba wrote: > I found bug in cryptmd5cuda patch rev2. > The problem was in this functions: > > static void *salt(char *ciphertext) - it was returning prefix and salt. > Everything have been ok during test because address returned by salt > was the same that was set_salt() parameter. > During "normal cracking" salt has been shortend to SALT_SIZE, and > copied to another place in memory. > > I have fixed that, and it works as it should. Thank you for explaining this. I am relieved to know it was this simple. > I have also: > -added $apr1$ support Great! > -tested your proposed F and G functions (there are around %1 slower, I > have already tried them earlier,i suppose that it may be faster for > AMD's) This is puzzling. As far as I'm aware, they're supposed to be faster on NVidia as well. How did this change affect code size? Did the code size reduce with the fewer-ops F and G functions? Perhaps you have some other bottleneck that you're hitting. > -tested x[15]=0 situation +1.5%, thanks! > > After some other changes I've got +11% comparing to rev2. Sounds good. > I still have got this "i%7" problem, but will dig more into MD5_std.c. > and try to figure out how to do it on gpu (with memory limitations). > > I've done changes you proposed earlier (MIN macros,sse2 bulid). > Other patches seems to work properly. > Cryptmd5-cuda patch rev 3 for john 1.7.8 is on wiki. > I've tested it on corelogic 2010 $1$ hashes and got: > cpu (1 core of i3-2100): > guesses: 583 time: 0:00:25:18 100% c/s: 9011 trying: hallo This was probably a 32-bit build. A 64-bit build would be 50% faster or so. A build with the -jumbo patch and very recent gcc or with icc would be faster yet (up to around 30000 c/s per core). > gpu: > guesses: 583 time: 0:00:02:10 100% c/s: 113948 trying: 12345fgh - hallo That's OK, but it's 3 times slower than the benchmark you have posted on the wiki, right? So is there a 3x performance hit for actual cracking as compared to --test? 100k c/s is probably achievable on your i3-2100 CPU if you do a more optimal build (see above) and use all CPU cores at once. I am not asking you to do that; I merely point out that the speedup with a GPU is still questionable, unfortunately. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.