Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <475c89b30916e3c60ec34c45f7ab96ab@smtp.hushmail.com>
Date: Wed, 11 Apr 2012 09:11:29 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: OpenCL runtime errors

On 04/11/2012 05:59 AM, Solar Designer wrote:
> magnum, Lukas, all -
> 
> On Wed, Apr 11, 2012 at 01:11:57AM +0200, magnum wrote:
>> While experimenting with RAR I found a detail that stops many runtime
>> errors similar to the ones mentioned - and makes adjustment to different
>> devices (including really weak ones) easier. Most current OpenCL formats
>> (I think all except mine) use this to determine the device's maximum
>> local worksize:
>> (...)
> 
> Great.  I see that you've already implemented it for RAR.  Who is to do
> it for the rest - the authors of those other pieces of code, or you?

Not sure. I was hoping Samuele would find a good shared find_best_lws()
function for common-opencl.c but I realise it might be tricky to get one
work for any slow or fast format.

> Here's what I am getting for RAR.  Before the above change:

Fwiw the above change was in there in the first release already. However
in yesterday's commit I made some AMD fixes that I really hoped would
cure AMD. So the below was discouraging:

> After the change:
> 
> user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar -pla=1
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Max allowed local work size 256, best multiple 64
> Local work size (LWS) 128, Keys per crypt (KPC) 2048
> Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) FAILED (cmp_all(1))

The current code runs fine on all CPU's and GPU's I have tested, I was
hoping it was fine now. I'll look further into it.

> user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar
> OpenCL platform 0: NVIDIA CUDA, 1 device(s).
> Using device 0: GeForce GTX 570
> Max allowed local work size 576, best multiple 32
> Local work size (LWS) 512, Keys per crypt (KPC) 8192
> Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) DONE
> Raw:    2259 c/s real, 2211 c/s virtual

This really beats me. The GTX580 is damn near identical (isn't it?) and
I get 4000+ c/s. What build output do you get if you add
"-cl_nv_verbose" to the build options in common-opencl.c's include_source()?

Thanks for this input (I'd be helped by other's results for RAR too)

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.