john-users - Re: CL_OUT_OF_RESOURCES error with --format=dmg-opencl

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <90C8D58A-B43E-4C4A-AA50-852DE33F6E9C@gmail.com>
Date: Wed, 10 May 2017 15:11:09 -0700
From: B B <dustythepath@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: CL_OUT_OF_RESOURCES error with --format=dmg-opencl


> On May 10, 2017, at 3:07 PM, B B <dustythepath@...il.com> wrote:
> 
> 
>> On May 10, 2017, at 1:32 PM, magnum <john.magnum@...hmail.com <mailto:john.magnum@...hmail.com>> wrote:
>> 
>> On 2017-05-10 22:03, B B wrote:
>>>> 
>>>> Does the CL_OUT_OF_RESOURCES happen almost immediately, or after a while?
>>>> 
>>>> Please post your output from this:
>>>> 
>>>> ./john -test —format=dmg-opencl -v=5
>>>> 
>>>> and in case the test fails, post output from this as well:
>>>> 
>>>> ./john -dev=0 -list=opencl-devices
>>>> 
>>>> 
>>>> magnum
>>>> 
>>> The test succeeds. The CL_OUT_OF_RESOURCES error happens after a while, always.
>>> Output for: ./john -test --format=dmg-opencl -v=5
>>> initUnicode(UNICODE, ASCII/ASCII)
>>> ASCII -> ASCII -> ASCII
>>> Will run 8 OpenMP threads
>>> Device 0: GeForce GTX 760
>>> Benchmarking: dmg-opencl, Apple DMG [PBKDF2-SHA1 OpenCL 3DES/AES]... (8xOMP) Loaded 7 hashes with 7 different salts to
>>> test db from test vectors
>>> Options used: -I /home/user/JtR/JohnTheRipper-bleeding-jumbo/run/kernels -cl-mad-enable -DSM_MAJOR=3 -DSM_MINOR=0
>>> -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=32786 -DSIZEOF_SIZE_T=8 -DDEV_VER_MAJOR=378 -DDEV_VER_MINOR=13 -D_OPENCL_COMPILER
>>> -DKEYLEN=64 -DSALTLEN=64 -DOUTLEN=32 $JOHN/kernels/pbkdf2_hmac_sha1_unsplit_kernel.cl
>>> Calculating best GWS for LWS=32; max. 200ms single kernel invocation.
>>> Raw speed figures including buffer transfers:
>>> (...)
>>> Calculating best GWS for LWS=1024; max. 200ms single kernel invocation.
>>> Raw speed figures including buffer transfers:
>>> xfer: 104.448us, crypt: 64.064ms, xfer: 51.168us
>>> gws:      9216   143505c/s      143505 rounds/s  64.220ms per crypt_all()!
>>> xfer: 201.376us, crypt: 96.084ms, xfer: 95.456us
>>> gws:     18432   191241c/s      191241 rounds/s  96.380ms per crypt_all()+
>>> xfer: 591.808us, crypt: 192.079ms, xfer: 205.504us
>>> gws:     36864   191127c/s      191127 rounds/s 192.876ms per crypt_all()
>>> xfer: 1.197ms, crypt: 384.339ms (exceeds 200ms)
>>> Local worksize (LWS) 1024, global worksize (GWS) 18432
>>> DONE
>>> Speed for cost 1 (iteration count) of 1000
>>> Raw:    53816 c/s real, 9068 c/s virtual
>>> It looks like it is working at least in part to me, hopefully you can determine the issue.
>> 
>> A part of the problem is likely that your hash has a lot higher number of iterations than what is auto-tuned for and you get bitten by the kernel duration watchdog. We should definitely auto-tune for actual loaded hashes, not test vectors - but that's not implemented yet for this format.
>> 
>> Try running for a while with really low GWS just to see if it evades the error:
>> 
>> GWS=1024 ./john —inc=custom —format=dmg-opencl hashfile —mask=?w?d??d?dknownword
>> 
>> If 1024 doesn't work, halve it until it does.
>> If 1024 works fine, try doubling it until best speed with no error (or until no speedup from doubling). Note that it's fine to stop a session with some GWS and resume it with some other, ie.
>> 
>> GWS=64 ./john —inc=custom —format=dmg-opencl hashfile —mask=?w?d??d?dknownword
>> ...
>> <job stopped>
>> ...
>> GWS=128 ./john -restore
>> 
>> 
>> magnum
> 
> Success!
> 
> I tried GWS=1024, then went to 4096. At 4096 my p/s increased 5 fold up to 715p/s from 128p/s. 
> 
> Thats very impressive
> 
> For the heck of it, since we were declaring only GWS instead of LWS and GWS, I tried GWS=8192 and replicated the error. I suppose I could pump it up a little more or try half again but will probably leave it at 4096 and watch for your eventual updates in GitHub for this particular issue. My latest ETA has moved from late June to 9 days out. 
> My desktop IS very slow now. So I may incrementally lower the single kernel invocation as stated in the README, however the performance increase is quite nice, thank you.
> 
> 10 years of data might be worth a second video card ;)
> 
> Thanks magnum
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.