Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F4413BF.1030704@linuxasylum.net>
Date: Tue, 21 Feb 2012 22:59:27 +0100
From: Samuele Giovanni Tonon <samu@...uxasylum.net>
To: john-dev@...ts.openwall.com
Subject: Re: OpenCL KPC and LWS [was: Recent github patches]

On 02/21/12 20:49, magnum wrote:
> On 02/20/2012 11:39 PM, Samuele Giovanni Tonon wrote:
>> On 02/20/12 20:38, magnum wrote:
>>> Anyway, I tried find_best_kpc and it picks very small numbers (like
>>> 69632) and end up a lot slower than just going with the default 2M. I
>>> also tried manually setting 4M and that worked fine and was faster than
>>> 2M. Maybe the find_best could be enhanced somehow.
>>
>> this is quite strange, find_best_kpc should get the faster KPC no matter
>> what,  could you give me some details:
>> format used, GPU card, LWS and some output ?
> 
> Here's some output. This is ssha on a GTX 280 but I saw similar issues
> with the 9600GT as well as with other formats.
> 
> First, a default run:
> 
> $ ../run/john -test -form:ssha-opencl
> OpenCL Platforms: 1
> OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
> GTX 280>>>
> Compilation log:
> 
> Max Group Work Size 512 Optimal local work size 32
> (to avoid this test on next run do export LWS=32)
> Local work size (LWS) 32, Keys per crypt (KPC) 2097152
> Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
> Many salts:     54973K c/s real, 27486K c/s virtual
> Only one salt:  32896K c/s real, 16448K c/s virtual
> 
> OK, so it picked LWS 32 and the default KPC is 2M. Then I ask for
> auto-tuning KPC:
> 
> $ KPC=0 ../run/john -test -form:ssha-opencl
> OpenCL Platforms: 1
> OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
> GTX 280>>>
> Compilation log:
> 
> Max Group Work Size 512 Optimal local work size 32
> (to avoid this test on next run do export LWS=32)
> Calculating best keys per crypt, this will take a while Optimal keys per
> crypt 98304
> (to avoid this test on next run do export KPC=98304)
> Local work size (LWS) 32, Keys per crypt (KPC) 98304
> Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
> Many salts:     45907K c/s real, 23542K c/s virtual
> Only one salt:  25657K c/s real, 12958K c/s virtual
> 
> It picks a very low number and performance drops. Now I try manually
> setting KPC to 1.5M instead:
> 
> $ KPC=$((3<<19)) ../run/john -test -form:ssha-opencl
> OpenCL Platforms: 1
> OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
> GTX 280>>>
> Compilation log:
> 
> Max Group Work Size 512 Optimal local work size 64
> (to avoid this test on next run do export LWS=64)
> Local work size (LWS) 64, Keys per crypt (KPC) 1572864
> Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
> Many salts:     59177K c/s real, 29155K c/s virtual
> Only one salt:  34603K c/s real, 17215K c/s virtual
> 
> This is ~10% faster than the 2M above BUT note that LWS happened to end
> up as 64 this time. Running the exact same command a couple of times, I
> sometimes get the following instead:
> 
> $ KPC=$((3<<19)) ../run/john -test -form:ssha-opencl
> OpenCL Platforms: 1
> OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
> GTX 280>>>
> Compilation log:
> 
> Max Group Work Size 512 Optimal local work size 32
> (to avoid this test on next run do export LWS=32)
> Local work size (LWS) 32, Keys per crypt (KPC) 1572864
> Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
> Many salts:     53970K c/s real, 26853K c/s virtual
> Only one salt:  32955K c/s real, 16399K c/s virtual
> 
> Here, LWS was 32 and performance was worse than with KPC=2M.
> 
> So the main issue is that auto KPC does not pick a good number. The LWS
> fluctuations might be due to normal variations between runs. I should
> have recorded the figures for KPC=2M and LWS=64 but I missed that.

looks like a chicken-egg problem: when lws is tested i use the default
kpc=2M, when LWS is up i use the best LWS i just detected; luksas
already reported this kind of problem but i thought we were safe since
LWS usually is rather obvious.

i will make some changes to print lws times during a debug session so
you can tell me what are the numbers behind those different LWS.

thanks for the report!

Cheers
Samuele

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.