john-dev - Re: Response during OpenCL sessions

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <388ed5565379c3932140ea691746af0b@smtp.hushmail.com>
Date: Wed, 19 Dec 2012 00:24:06 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Response during OpenCL sessions

On 3 Oct, 2012, at 2:37 , magnum <john.magnum@...hmail.com> wrote:
> Now that we start to get split kernels with much shorter durations, I wonder if/how we could react to some events between the kernel calls without a lot of work. I thought it could be as simple as this pseudo code "patch":
> 
> 
> void crypt_all(int count)
> {
>  	enqueue(Transfer);
>  	enqueue(RarInitKernel);
>  	for (i=0; i<HASH_LOOPS; i++)
>  	{
>  		enqueue(RarLoopKernel);
> + 		if (event_pending)
> + 			process_event();
>  	}
>  	enqueue(RarFinalKernel);
> 
> 
> I tried the above, using a process_event() similar to the one in cracker.c but somehow(...)

I revisited this issue and now I got it, it's totally obvious in hindsight: That loop merely enqueues all the kernel calls, very quickly. Later, in the final, blocking, clEnqueueReadBuffer() call, nearly all of the actual execution happens. And at that point I did not have any event checks (well I couldn't, that single call blocks for 5-20 seconds while the queue finishes).

The code would need to look something like this, unless someone can think of a better solution:

  void crypt_all(int count)
  {
  	enqueue(Transfer);
  	enqueue(RarInitKernel);
  	for (i=0; i<HASH_LOOPS; i++)
  	{
  		enqueue(RarLoopKernel);
+  		clFinish();
+  		if (event_pending)
+  			process_event();
  	}
  	enqueue(RarFinalKernel);

This works like a champ - but has a slight performance impact. For wpapsk on Tahiti, speed drops from 133576 c/s to 132731 c/s. That is just 0.7%, but is it worth it? Perhaps not for wpapsk but for Office 2013 I suppose it might be: Without it, a status output can be delayed by nearly 20 seconds.

We could add yet another define in Makefile for enabling or disabling this. Or perhaps even better a john.conf setting (global or per format?), the extra check would be dirt cheap in this context.

Anyway I think we should put that process_event() function in common-opencl.c (and name it opencl_process_event()). OTOH this might be used for CUDA too at some point so maybe we should put it in signals.c or something? The local function I used when experimenting was an exact copy of crk_process_event() in cracker.c.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.