Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2d1b2bb4863dac064a0ee69ca8552b4e@smtp.hushmail.com>
Date: Thu, 27 Aug 2015 01:01:27 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: LWS and GWS auto-tuning

On 2015-08-27 00:41, Solar Designer wrote:
> On Wed, Aug 26, 2015 at 10:21:31PM +0200, magnum wrote:
>> On 2015-08-26 21:37, Solar Designer wrote:
>>> Unfortunately, LWS auto-tuning tries unreasonably high values (like
>>> 8192) and sometimes fails totally (results in an error from OpenCL and
>>> program abort) for some formats when tested with one or the other OpenCL
>>> SDK on "well".  Can you look into this, and perhaps commit a fix?
>>
>> That's odd, can you name a format?
>
> For example:
>
> $ ./john -test -form=phpass-opencl -dev=0 -v=4
> Device 0: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
> Benchmarking: phpass-opencl ($P$9 lengths 0 to 15) [MD5 OpenCL]... Options used: -I ./kernels -cl-mad-enable -D__CPU__ -DDEVICE_INFO=33 -DDEV_VER_MAJOR=1 -DDEV_VER_MINOR=2 -D_OPENCL_COMPILER
> Build log: Compilation started
> Compilation done
> Linking started
> Linking done
> Device build started
> Device build done
> Kernel <phpass> was not vectorized
> Done.
> Calculating best global worksize (GWS); max. 100ms single kernel invocation.
> gws:       256       24569 c/s       24569 rounds/s  10.419ms per crypt_all()!
> gws:       512       24150 c/s       24150 rounds/s  21.200ms per crypt_all()
> gws:      1024       26315 c/s       26315 rounds/s  38.912ms per crypt_all()+
> gws:      2048       26323 c/s       26323 rounds/s  77.800ms per crypt_all()
> Calculating best local worksize (LWS)
> Testing LWS=128 GWS=1024 ... 151.439ms+
> Testing LWS=256 GWS=1024 ... 302.382ms
> Testing LWS=512 GWS=1024 ... 604.730ms
> Testing LWS=1024 GWS=1024 ... 1.209s
> Testing LWS=2048 GWS=2048 ...Segmentation fault

The device actually supports 8192 per the queries, and that's why it is 
tried. This is also seen in our list output:

Platform version: OpenCL 1.2
	Device #0 (0) name:	Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
	Device vendor:		Intel(R) Corporation
	Device type:		CPU (LE)
	Device version:		OpenCL 1.2 (Build 9756)
	Driver version:		1.2.0.9756
	Native vector widths:	char 32, short 16, int 8, long 4
	Preferred vector width:	char 1, short 1, int 1, long 1
	Global Memory:		31.0 GB
	Global Memory Cache:	256.2 KB
	Local Memory:		32.0 KB (Global)
	Max memory alloc. size:	7.0 GB
	Max clock (MHz):	3500
	Profiling timer res.:	1 ns
	Max Work Group Size:	8192  <---- here!
	Parallel compute cores:	8

I'm do not think the de-facto limit of 1024 we've been used to is an 
actual maximum per any specifications. Also, when I tried this it ran 
just fine through the tests up to 8192 but picked a lower number as 
best. If it wasn't actually supported, we should get an 
CL_INVALID_WORK_GROUP_SIZE error and it would have been caught and 
handled properly.

I presume your segfault was unrelated to the work size.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.