Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120325020744.GH8909@openwall.com>
Date: Sun, 25 Mar 2012 06:07:44 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: CUDA & OpenCL status

On Thu, Mar 22, 2012 at 11:10:15AM +0200, Milen Rangelov wrote:
> Problem is that NVidia does not have the rotate. AMD has BITALIGN_INT that
> does it. With NVidia, rotate() would actually do (a<<s)|(a>>(32-s)). With
> Fermi you can have the SHL+ADD thing (which I guess is just a 32-bit MAD in
> fact), but rotate() still does not do the trick. What I do is something
> like:
> 
> #define ROTATE (a<<S)+(a>>(32-s))

Oh, ADD instead of OR, and them having a 32-bit integer MAD.  This makes
sense.  I did not realize they had it (I thought it was FP only).

> Which makes the compiler emit the needed instructions. Of course that's not
> as good as having a single "rotate" instruction, but still doing 2 bitwise
> ops is better than doing 3. This works only on sm_2x architectures, so it
> really does not matter on say 9800GT.

But it does not hurt on those older cards, does it?  Do ADDs have the
same latency as ORs there (I guess so)?

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.