john-dev - Re: OpenCL kernel max running time vs. "ASIC hang"

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120625232714.GA10703@openwall.com>
Date: Tue, 26 Jun 2012 03:27:14 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Cc: Bit Weasil <bitweasil@...il.com>
Subject: Re: OpenCL kernel max running time vs. "ASIC hang"

On Tue, Jun 26, 2012 at 01:06:08AM +0200, magnum wrote:
> On 2012-06-26 00:27, Solar Designer wrote:
> >I discussed this matter with Bit Weasil on IRC a few days ago.
> >According to him, we shouldn't be trying to spend more than 200 ms per
> >OpenCL kernel invocation, or we'll face random "ASIC hang" issues on AMD
[...]

> That's not an easy goal with slow formats. For RAR, with 256K rounds of 
> SHA-1, I currently don't get much below 2000ms on 7790, and that's with 
> GWS that produces a 40% slower c/s than what we currently use. For best 
> c/s we exceed 9 seconds. Then again, my code is made by a newbie. Making 
> it 10x faster would be nice for sure. But even Milen said his RAR kernel 
> ran for 2-3 seconds a while ago.

I understand that reducing the amount of parallelism in a kernel
invocation slows things down, but why not reduce the amount of work per
kernel invocation by other means - specifically, in your example, why
not reduce the number of SHA-1 iterations per kernel invocation?  We may
invoke the kernel more than once from one crypt_all() call,
sequentially.  For example, the 256k may be achieved by 256 invocations
of a kernel doing 1k iterations.  This would bring the 9 seconds down to
35 ms per kernel invocation.  Perhaps the intermediate results can even
stay in the GPU between those invocations.

Have you considered that?

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.