john-dev - Re: OpenCL runtime errors

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120411035934.GA16163@openwall.com>
Date: Wed, 11 Apr 2012 07:59:34 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: OpenCL runtime errors

magnum, Lukas, all -

On Wed, Apr 11, 2012 at 01:11:57AM +0200, magnum wrote:
> While experimenting with RAR I found a detail that stops many runtime
> errors similar to the ones mentioned - and makes adjustment to different
> devices (including really weak ones) easier. Most current OpenCL formats
> (I think all except mine) use this to determine the device's maximum
> local worksize:
> 
> clGetDeviceInfo(devices[gpu_id], CL_DEVICE_MAX_WORK_GROUP_SIZE,
> sizeof(max_group_size), &max_group_size, NULL);
> 
> ...but this figure does not help you much. It just shows the maximum
> supported worksize for any (lean) kernel on this device. I use this instead:
> 
> clGetKernelWorkGroupInfo(crypt_kernel, devices[gpu_id],
> CL_KERNEL_WORK_GROUP_SIZE, sizeof(max_group_size), &max_group_size, NULL);
> 
> This one tells us the maximum local worksize for *this very kernel* on
> this device. That is, the OpenCL implementation uses the resource
> requirements of the kernel (register usage etc.) to determine the max
> usable local worksize. Works like a charm. I currently don't even have a
> find_best_lws(), it's just a couple of simple (and quick) tests in init().

Great.  I see that you've already implemented it for RAR.  Who is to do
it for the rest - the authors of those other pieces of code, or you?

Here's what I am getting for RAR.  Before the above change:

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -pla=1 -fo=rar
OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Max local work size 256, best multiple 64
Local work size (LWS) 128, Keys per crypt (KPC) 2048
Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) FAILED (cmp_all(1))

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar
OpenCL platform 0: NVIDIA CUDA, 1 device(s).
Using device 0: GeForce GTX 570
Max local work size 576, best multiple 32
Local work size (LWS) 512, Keys per crypt (KPC) 8192
Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) DONE
Raw:    2266 c/s real, 2211 c/s virtual

After the change:

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar -pla=1
OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Max allowed local work size 256, best multiple 64
Local work size (LWS) 128, Keys per crypt (KPC) 2048
Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) FAILED (cmp_all(1))

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar
OpenCL platform 0: NVIDIA CUDA, 1 device(s).
Using device 0: GeForce GTX 570
Max allowed local work size 576, best multiple 32
Local work size (LWS) 512, Keys per crypt (KPC) 8192
Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) DONE
Raw:    2259 c/s real, 2211 c/s virtual

And while I am at it, here are some CPU benchmarks:

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar -pla=1 -dev=1
OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 1: AMD FX(tm)-8120 Eight-Core Processor
Note: OpenCL device is CPU. A non-OpenCL build may be faster.
Local work size (LWS) 8, Keys per crypt (KPC) 128
Benchmarking: RAR3 (4 characters) [OpenCL]... (8xOMP) DONE
Raw:    256 c/s real, 32.2 c/s virtual

linux-x86-64-xop build:

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar
Benchmarking: RAR3 [32/64]... (8xOMP) DONE
Raw:    400 c/s real, 50.1 c/s virtual

Thanks,

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.