Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120716093510.GA21271@openwall.com>
Date: Mon, 16 Jul 2012 13:35:10 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: xsha512-cuda & xsha512-opencl testing

myrice -

On Mon, Jul 16, 2012 at 04:10:01PM +0800, myrice wrote:
> Unfortunately, after lukas's work on bull, I cannot run my cuda format on it...

It's weird, mscash2-cuda worked, but xsha512-cuda did not.  I've just
rebooted bull, and xsha512-cuda works now.

BTW, xsha512-cuda produces nasty sound at maybe 5 KHz or so - is this
the frequency of PCIe transfers or global memory accesses or something
like that?

> This is result under xsha512-opencl with incremental mode.

Which incremental mode, exactly?  This matters.  If the incremental mode
is not locked to a specific password length (e.g., just length 8), then
there's some overhead early on to switch between lengths.  For quick
runs (like a few minutes), this overhead is significant.  So you should
be using -i=all8 (locked to length 8 only).  Is this what you used?

> Incremental mode on xsha512-opencl with 7970:
> HashNum_SaltNum
> 1_1
> guesses: 1  time: 0:00:00:06 DONE (Mon Jul 16 10:54:39 2012)  c/s: 12838K

6 seconds is too little, but otherwise this is reasonable.

> 100_100
> guesses: 6  time: 0:00:06:45 0.00%  c/s: 48827K
> 
> 10K_10K
> guesses: 89  time: 0:00:03:00 0.00%  c/s: 49944K

OK.

> 10K_100
> guesses: 279  time: 0:00:05:40 0.00%  c/s: 2871M

About 29M c/s raw hashing speed.  The slowdown from 50M to 29M with 100
hashes/salt is not too bad.  I was afraid it'd be worse.  Yet there
should be lots of room for improvement here.

> 10K_1
> guesses: 5351  time: 0:00:03:43 0.00%  c/s: 72953M

Too many got cracked (over 50%).

> 1M_1
> guesses: 47731  time: 0:00:03:56 0.00%  c/s: 2707G

I guess we'd achieve a similar speed on CPU (2.7M passwords/second).

> 1M_1K
> guesses: 4196  time: 0:00:04:41 0.00%  c/s: 21220M

21M, quite reasonable and better than current CPU code.

I guess this is the bottleneck of transfers of hashes from GPU to CPU,
for get_hash*()?

> 1M_1M
> guesses: 50  time: 0:00:04:02 0.00%  c/s: 51453K

OK.

Overall, the scaling with many hashes per salt is better than what I had
expected for your code (since it was not subjected to such
testing/tuning before), but it's not perfect.

Thanks,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.