john-dev - Re: OpenCL KPC and LWS

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f91f1c14e344fedb90f047a0b5d620dc@smtp.hushmail.com>
Date: Tue, 06 Mar 2012 20:14:11 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: OpenCL KPC and LWS

On 02/21/2012 10:59 PM, Samuele Giovanni Tonon wrote:
>> So the main issue is that auto KPC does not pick a good number. The LWS
>> fluctuations might be due to normal variations between runs. I should
>> have recorded the figures for KPC=2M and LWS=64 but I missed that.
> 
> looks like a chicken-egg problem: when lws is tested i use the default
> kpc=2M, when LWS is up i use the best LWS i just detected; luksas
> already reported this kind of problem but i thought we were safe since
> LWS usually is rather obvious.

I realise I have a lot to catch up from you guys but here are a couple
of things that seem to get good and FAST results on my gear, both GPU
and CPU:

Have you tried querying
clGetKernelWorkGroupInfo() for
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE? On the CPU's and GPU's I
have tried, it seems reliable (better than the current testing) and very
fast. It's introduced in OpenCL 1.1 so I added a fallback like this:


  	clEnqueueWriteBuffer(queue_prof, buffer_keys, CL_TRUE, 0,
  	    (PLAINTEXT_LENGTH) * SSHA_NUM_KEYS, saved_plain, 0, NULL, NULL);

 +	// This is OpenCL 1.1, we catch CL_INVALID_VALUE and use a fallback
 +	ret_code = clGetKernelWorkGroupInfo (crypt_kernel, devices[gpu_id],
 +	    CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE,
 +	    sizeof(best_multiple), &best_multiple, NULL);
 +
 +	if (ret_code == CL_INVALID_VALUE) {
 +	    //printf("Can't get preferred LWS multiple, using 1\n");
 +	    best_multiple = 1;
 +	} else {
 +	    HANDLE_CLERROR(ret_code, "Query preferred work group multiple");
 +	    //printf("preferred multiple: %zu\n", best_multiple);
 +	}
 +
  	// Find minimum time
 -	for (my_work_group = 1; (int) my_work_group <= (int) max_group_size;
 +	for (my_work_group = best_multiple; (int) my_work_group <= (int)
max_group_size;
  	    my_work_group *= 2) {


Also, I seem to get good and very fast results with this loop in KPC
enumeration:

    for( num=local_work_size; num <= SSHA_NUM_KEYS ; num<<=1)


Is testing every 16K really of any use? I just see fluctuating numbers
and a super slow test.


magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.