Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120625232714.GA10703@openwall.com>
Date: Tue, 26 Jun 2012 03:27:14 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Cc: Bit Weasil <bitweasil@...il.com>
Subject: Re: OpenCL kernel max running time vs. "ASIC hang"

On Tue, Jun 26, 2012 at 01:06:08AM +0200, magnum wrote:
> On 2012-06-26 00:27, Solar Designer wrote:
> >I discussed this matter with Bit Weasil on IRC a few days ago.
> >According to him, we shouldn't be trying to spend more than 200 ms per
> >OpenCL kernel invocation, or we'll face random "ASIC hang" issues on AMD
[...]

> That's not an easy goal with slow formats. For RAR, with 256K rounds of 
> SHA-1, I currently don't get much below 2000ms on 7790, and that's with 
> GWS that produces a 40% slower c/s than what we currently use. For best 
> c/s we exceed 9 seconds. Then again, my code is made by a newbie. Making 
> it 10x faster would be nice for sure. But even Milen said his RAR kernel 
> ran for 2-3 seconds a while ago.

I understand that reducing the amount of parallelism in a kernel
invocation slows things down, but why not reduce the amount of work per
kernel invocation by other means - specifically, in your example, why
not reduce the number of SHA-1 iterations per kernel invocation?  We may
invoke the kernel more than once from one crypt_all() call,
sequentially.  For example, the 256k may be achieved by 256 invocations
of a kernel doing 1k iterations.  This would bring the 9 seconds down to
35 ms per kernel invocation.  Perhaps the intermediate results can even
stay in the GPU between those invocations.

Have you considered that?

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.