|
Message-ID: <20120625232714.GA10703@openwall.com> Date: Tue, 26 Jun 2012 03:27:14 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Cc: Bit Weasil <bitweasil@...il.com> Subject: Re: OpenCL kernel max running time vs. "ASIC hang" On Tue, Jun 26, 2012 at 01:06:08AM +0200, magnum wrote: > On 2012-06-26 00:27, Solar Designer wrote: > >I discussed this matter with Bit Weasil on IRC a few days ago. > >According to him, we shouldn't be trying to spend more than 200 ms per > >OpenCL kernel invocation, or we'll face random "ASIC hang" issues on AMD [...] > That's not an easy goal with slow formats. For RAR, with 256K rounds of > SHA-1, I currently don't get much below 2000ms on 7790, and that's with > GWS that produces a 40% slower c/s than what we currently use. For best > c/s we exceed 9 seconds. Then again, my code is made by a newbie. Making > it 10x faster would be nice for sure. But even Milen said his RAR kernel > ran for 2-3 seconds a while ago. I understand that reducing the amount of parallelism in a kernel invocation slows things down, but why not reduce the amount of work per kernel invocation by other means - specifically, in your example, why not reduce the number of SHA-1 iterations per kernel invocation? We may invoke the kernel more than once from one crypt_all() call, sequentially. For example, the 256k may be achieved by 256 invocations of a kernel doing 1k iterations. This would bring the 9 seconds down to 35 ms per kernel invocation. Perhaps the intermediate results can even stay in the GPU between those invocations. Have you considered that? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.