|
Message-ID: <20120325020744.GH8909@openwall.com> Date: Sun, 25 Mar 2012 06:07:44 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: CUDA & OpenCL status On Thu, Mar 22, 2012 at 11:10:15AM +0200, Milen Rangelov wrote: > Problem is that NVidia does not have the rotate. AMD has BITALIGN_INT that > does it. With NVidia, rotate() would actually do (a<<s)|(a>>(32-s)). With > Fermi you can have the SHL+ADD thing (which I guess is just a 32-bit MAD in > fact), but rotate() still does not do the trick. What I do is something > like: > > #define ROTATE (a<<S)+(a>>(32-s)) Oh, ADD instead of OR, and them having a 32-bit integer MAD. This makes sense. I did not realize they had it (I thought it was FP only). > Which makes the compiler emit the needed instructions. Of course that's not > as good as having a single "rotate" instruction, but still doing 2 bitwise > ops is better than doing 3. This works only on sm_2x architectures, so it > really does not matter on say 9800GT. But it does not hurt on those older cards, does it? Do ADDs have the same latency as ORs there (I guess so)? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.