|
Message-ID: <20120708085915.GD29336@openwall.com> Date: Sun, 8 Jul 2012 12:59:15 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: optimized mscash2-opencl On Sat, Jul 07, 2012 at 05:14:52PM +0530, Sayantan Datta wrote: > Guess I didn't had much deeper insight into the codes which prevented me > from moving the two SHA1 from the 10K loops. BTW, after I was done with my optimizations yesterday, I took a look at Lukas' CUDA code and found out that its 10k loop is very similar to what I came up with. So you could get this idea from there. > BTW I was expecting much more > performace, nearly double on 7970 becuse the two SHA1 represented almost > half of the total computation. As I wrote earlier, this didn't make a difference on its own because the optimizer was presumably good enough to move those two SHA-1's out of the loop anyway. However, when I had tried changing S30() to use rotate() (changed one source code line only), I got almost exactly twice slower performance (below 50k c/s), which was a hint for me about these SHA-1s. Apparently, the optimizer is this good only when it is dealing with pure bitwise ops, ADDs, and shifts, but not with rotate(), which it probably treats as opaque. The actual speedup with my patch is mostly due to avoiding the endianness conversions in the loop. This also explains why there's greater speedup on NVIDIA (no bit rotate instruction) than on AMD. BTW, we still rely on the optimizer to substitute constants for some W[] elements. SHA1_digest() could be optimized further, reducing our reliance on the optimizer, which may be a good thing to do (such as for different OpenCL SDKs). > This could mean we are using a lot of global memory Hardly. The speed is pretty good. At 99k c/s, we're at over 2 billion of SHA-1s per second, whereas hashcat does "2198.8M c/s" at raw SHA-1 on a 7970, and this almost certainly includes some step reversals. > With this patch we are at par with > hashcat on 570 but stll lagging behind on 7970. Yes. Your correction to bitselect() brought us to over 90% of hashcat's speed on 7970, though. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.