Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120501002912.GA9336@openwall.com>
Date: Tue, 1 May 2012 04:29:12 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Myrice: Weekly Report #3

myrice -

On Mon, Apr 30, 2012 at 10:24:23PM +0800, myrice wrote:
> Priorities:
> 1. Support longer passwords on XSHA512 format without performance lose
> 2. Anaysis performance of stroing salts in GPU.
> 3. Keep discussing password generation
>     - I have seen Solar and Frank's posts. I will join the discussion
> ASAP. Thank you!

Can you please add the following tasks to the above list? -

4. Get xsha512-opencl to work on the HD 7970 in bull.  Currently, the
version in magnum-jumbo fails as follows:

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=xsha512-opencl -pla=1
OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Compilation log: LOOP UNROLL: pragma unroll (line 174)
    Unrolled as requested!
LOOP UNROLL: pragma unroll (line 179)
    Unrolled as requested!
LOOP UNROLL: pragma unroll (line 194)
    Unrolled as requested!

Local work size = 256
Global work size = 2097152
Benchmarking: Mac OS X 10.7+ salted SHA-512 [OpenCL]... FAILED (get_hash[0](0))

It works on GTX 570 and on CPU fine. :-)  In fact, on CPU it is only
slightly slower than the OpenSSL/OpenMP code, which is impressive.

5. Figure out why xsha512-cuda is significantly slower on the GTX 570 in
bull (1600 MHz) than it is on your GTX 580, and/or correct this.  The
difference should be about 5% in favor of the GTX 580 (which magnum
achieved in his RAR code):

512*1544 = 790528
480*1600 = 768000
768000/790528 = 0.9715

Well, maybe even 3%, although there's also the memory bus width
difference (384 vs. 320 bits) and the PCIe speed difference (the slot in
bull runs at x8 because I also have the HD 7970 installed and this
motherboard can only do x8 to each in such config).  The bandwidth
differences should not affect the "many salts" case significantly,
though.  So there's probably something else.

Currently, xsha512-cuda from magnum-jumbo with PLAINTEXT_LENGTH changed
to 12 does only:

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=xsha512-cuda
Benchmarking: Mac OS X 10.7+ salted SHA-512 [CUDA]... DONE
Many salts:     36854K c/s real, 37058K c/s virtual
Only one salt:  17625K c/s real, 17625K c/s virtual

whereas you reported getting 70M c/s, and I was previously getting over
50M c/s with my own hacks of your older code.

Thanks,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.