|
Message-ID: <20120411035934.GA16163@openwall.com> Date: Wed, 11 Apr 2012 07:59:34 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: OpenCL runtime errors magnum, Lukas, all - On Wed, Apr 11, 2012 at 01:11:57AM +0200, magnum wrote: > While experimenting with RAR I found a detail that stops many runtime > errors similar to the ones mentioned - and makes adjustment to different > devices (including really weak ones) easier. Most current OpenCL formats > (I think all except mine) use this to determine the device's maximum > local worksize: > > clGetDeviceInfo(devices[gpu_id], CL_DEVICE_MAX_WORK_GROUP_SIZE, > sizeof(max_group_size), &max_group_size, NULL); > > ...but this figure does not help you much. It just shows the maximum > supported worksize for any (lean) kernel on this device. I use this instead: > > clGetKernelWorkGroupInfo(crypt_kernel, devices[gpu_id], > CL_KERNEL_WORK_GROUP_SIZE, sizeof(max_group_size), &max_group_size, NULL); > > This one tells us the maximum local worksize for *this very kernel* on > this device. That is, the OpenCL implementation uses the resource > requirements of the kernel (register usage etc.) to determine the max > usable local worksize. Works like a charm. I currently don't even have a > find_best_lws(), it's just a couple of simple (and quick) tests in init(). Great. I see that you've already implemented it for RAR. Who is to do it for the rest - the authors of those other pieces of code, or you? Here's what I am getting for RAR. Before the above change: user@...l:~/john/magnum-jumbo/src$ ../run/john -te -pla=1 -fo=rar OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). Using device 0: Tahiti Max local work size 256, best multiple 64 Local work size (LWS) 128, Keys per crypt (KPC) 2048 Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) FAILED (cmp_all(1)) user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar OpenCL platform 0: NVIDIA CUDA, 1 device(s). Using device 0: GeForce GTX 570 Max local work size 576, best multiple 32 Local work size (LWS) 512, Keys per crypt (KPC) 8192 Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) DONE Raw: 2266 c/s real, 2211 c/s virtual After the change: user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar -pla=1 OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). Using device 0: Tahiti Max allowed local work size 256, best multiple 64 Local work size (LWS) 128, Keys per crypt (KPC) 2048 Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) FAILED (cmp_all(1)) user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar OpenCL platform 0: NVIDIA CUDA, 1 device(s). Using device 0: GeForce GTX 570 Max allowed local work size 576, best multiple 32 Local work size (LWS) 512, Keys per crypt (KPC) 8192 Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) DONE Raw: 2259 c/s real, 2211 c/s virtual And while I am at it, here are some CPU benchmarks: user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar -pla=1 -dev=1 OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). Using device 1: AMD FX(tm)-8120 Eight-Core Processor Note: OpenCL device is CPU. A non-OpenCL build may be faster. Local work size (LWS) 8, Keys per crypt (KPC) 128 Benchmarking: RAR3 (4 characters) [OpenCL]... (8xOMP) DONE Raw: 256 c/s real, 32.2 c/s virtual linux-x86-64-xop build: user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar Benchmarking: RAR3 [32/64]... (8xOMP) DONE Raw: 400 c/s real, 50.1 c/s virtual Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.