|
Message-ID: <20120325132337.GA10318@openwall.com> Date: Sun, 25 Mar 2012 17:23:37 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: CUDA & OpenCL status On Sun, Mar 25, 2012 at 06:07:44AM +0400, Solar Designer wrote: > On Thu, Mar 22, 2012 at 11:10:15AM +0200, Milen Rangelov wrote: > > Problem is that NVidia does not have the rotate. AMD has BITALIGN_INT that > > does it. With NVidia, rotate() would actually do (a<<s)|(a>>(32-s)). With > > Fermi you can have the SHL+ADD thing (which I guess is just a 32-bit MAD in > > fact), but rotate() still does not do the trick. What I do is something > > like: > > > > #define ROTATE (a<<S)+(a>>(32-s)) > > Oh, ADD instead of OR, and them having a 32-bit integer MAD. This makes > sense. I did not realize they had it (I thought it was FP only). I've just tried this (for phpass) and it didn't result in MADs being generated. Instead, I saw the ADDs instead of ORs (indeed), and the only advantage was from the ADDs being sometimes re-ordered with other ADDs that we genuinely have in MD5. (BTW, this optimization may thus be helpful on CPUs as well - giving the compiler more instruction scheduling freedom.) I was compiling with "-arch=sm_21", and there were a couple of other (unrelated) 32-bit signed int MADs in the resulting code (if I read it correctly). I've even tried deliberately using a signed int there (as opposed to unsigned) - this did not help. Any idea why I wasn't getting MADs for this, or how I tell the compiler to use a MAD more explicitly? Lukas - meanwhile, I got phpass to 710k c/s on my GTX-570 1600 MHz (up from 633k that I reported before) by moving from sm_10 to sm_20 or sm_21 for the generated code (the latter is apparently not valid for my card, but happens to work - I guess the same code was generated as for sm_20) and by increasing BLOCKS from 126*3 to 160*3. I guess 126 was tuned for GTX-560, right? This new speed is slightly higher than the published speed for hashcat on an equivalent graphics card. The sm_10 to sm_20 change made cryptmd5-cuda slightly slower, though. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.