|
Message-ID: <fad196307f5ccda636b128e2cd5d7adb@smtp.hushmail.com> Date: Thu, 27 Sep 2012 23:58:08 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Benchmarking Milen's RAR kernel in JtR (was: RAR early reject) On 17 Aug, 2012, at 20:02 , magnum <john.magnum@...hmail.com> wrote: > On 2012-08-17 19:54, Milen Rangelov wrote: >> On Fri, Aug 17, 2012 at 3:26 PM, magnum <john.magnum@...hmail.com> wrote: >>> On 2012-08-17 09:19, Milen Rangelov wrote: >>>> May I borrow it for my project? >>> >>> But of course! If you like you can send me a rar kernel I can get hints >>> from, as a courtesy ;-) Doesn't need to be complete runnable code, just >>> a kernel. I think my key stretching loop is the bottleneck. >> >> Yeah, here it is, but I warn you, it's scary :) >> http://www.gat3way.eu/poc/amd_rar.cl > > Thanks! I'll have a look. It just can't be any more scary than mine :) OK it was more scary, LOL. Took me a while to figure out. I could not even understand the code until I got the idea to auto-indent it. Then things got more clear. It's good code, only one single branch - and that one's simply not avoidable. Just as a benchmark, I tucked the 6-char version of it into our RAR format and made some slight adjustments to the argument list and output layout to make it work. To my utter surprise it did on first try, it passes self-test on nvidia. I did not really expect everything to be correct without changing some data layout in host code. This won't be useful as-is, I can't use fixed-length except when testing (using that length test vectors :) but it gives me a benchmark - and if possible I will try to make my kernel better stealing ideas from it. I used the version for GCN and hoped it would be fairly good for nvidia too. But to my surprise it's 7-8% slower than my kernel on GTX 570, with 3952 c/s @16384 and a duration of 4.2 seconds (my kernel does 4250 c/s in 3.8 seconds). But then again your kernel is optimised for AMD, I do see details that should be changed for nvidia so it might end up faster if tweaked. On the 7970 though, it's more than three times faster than mine at over 13000 c/s (mine only does 4172 c/s). For some reason it currently fails self-test but that's probably trivial and not important now so disregarding that, the raw speed is 13162 c/s at a GWS of 16384, and a kernel duration of "only" 1.2 seconds. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.