|
Message-ID: <475c89b30916e3c60ec34c45f7ab96ab@smtp.hushmail.com> Date: Wed, 11 Apr 2012 09:11:29 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: OpenCL runtime errors On 04/11/2012 05:59 AM, Solar Designer wrote: > magnum, Lukas, all - > > On Wed, Apr 11, 2012 at 01:11:57AM +0200, magnum wrote: >> While experimenting with RAR I found a detail that stops many runtime >> errors similar to the ones mentioned - and makes adjustment to different >> devices (including really weak ones) easier. Most current OpenCL formats >> (I think all except mine) use this to determine the device's maximum >> local worksize: >> (...) > > Great. I see that you've already implemented it for RAR. Who is to do > it for the rest - the authors of those other pieces of code, or you? Not sure. I was hoping Samuele would find a good shared find_best_lws() function for common-opencl.c but I realise it might be tricky to get one work for any slow or fast format. > Here's what I am getting for RAR. Before the above change: Fwiw the above change was in there in the first release already. However in yesterday's commit I made some AMD fixes that I really hoped would cure AMD. So the below was discouraging: > After the change: > > user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar -pla=1 > OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). > Using device 0: Tahiti > Max allowed local work size 256, best multiple 64 > Local work size (LWS) 128, Keys per crypt (KPC) 2048 > Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) FAILED (cmp_all(1)) The current code runs fine on all CPU's and GPU's I have tested, I was hoping it was fine now. I'll look further into it. > user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=rar > OpenCL platform 0: NVIDIA CUDA, 1 device(s). > Using device 0: GeForce GTX 570 > Max allowed local work size 576, best multiple 32 > Local work size (LWS) 512, Keys per crypt (KPC) 8192 > Benchmarking: RAR3 (6 characters) [OpenCL]... (8xOMP) DONE > Raw: 2259 c/s real, 2211 c/s virtual This really beats me. The GTX580 is damn near identical (isn't it?) and I get 4000+ c/s. What build output do you get if you add "-cl_nv_verbose" to the build options in common-opencl.c's include_source()? Thanks for this input (I'd be helped by other's results for RAR too) magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.