john-dev - Re: OpenCL runtime errors

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <475c89b30916e3c60ec34c45f7ab96ab@smtp.hushmail.com>
Date: Wed, 11 Apr 2012 09:11:29 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: OpenCL runtime errors

On 04/11/2012 05:59 AM, Solar Designer wrote:
> magnum, Lukas, all -
> 
> On Wed, Apr 11, 2012 at 01:11:57AM +0200, magnum wrote:
>> While experimenting with RAR I found a detail that stops many runtime
>> errors similar to the ones mentioned - and makes adjustment to different
>> devices (including really weak ones) easier. Most current OpenCL formats
>> (I think all except mine) use this to determine the device's maximum
>> local worksize:
>> (...)
> 
> Great.  I see that you've already implemented it for RAR.  Who is to do
> it for the rest - the authors of those other pieces of code, or you?

Not sure. I was hoping Samuele would find a good shared find_best_lws()
function for common-opencl.c but I realise it might be tricky to get one
work for any slow or fast format.

> Here's what I am getting for RAR.  Before the above change:

Fwiw the above change was in there in the first release already. However
in yesterday's commit I made some AMD fixes that I really hoped would
cure AMD. So the below was discouraging:

> After the change:
> 
> user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar -pla=1
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Max allowed local work size 256, best multiple 64
> Local work size (LWS) 128, Keys per crypt (KPC) 2048
> Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) FAILED (cmp_all(1))

The current code runs fine on all CPU's and GPU's I have tested, I was
hoping it was fine now. I'll look further into it.

> user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar
> OpenCL platform 0: NVIDIA CUDA, 1 device(s).
> Using device 0: GeForce GTX 570
> Max allowed local work size 576, best multiple 32
> Local work size (LWS) 512, Keys per crypt (KPC) 8192
> Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) DONE
> Raw:    2259 c/s real, 2211 c/s virtual

This really beats me. The GTX580 is damn near identical (isn't it?) and
I get 4000+ c/s. What build output do you get if you add
"-cl_nv_verbose" to the build options in common-opencl.c's include_source()?

Thanks for this input (I'd be helped by other's results for RAR too)

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.