Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130126231037.GA2363@openwall.com>
Date: Sun, 27 Jan 2013 03:10:37 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Proposed optimizations to pwsafe

On Sun, Jan 27, 2013 at 01:07:13AM +0200, Milen Rangelov wrote:
> #define rotate(a,b) ((a<<b)+(a>>(32-b))
> 
> is faster than doing it the usual way:
> 
> #define rotate(a,b) ((a<<b)|(a>>(32-b))
> 
> and generated PTX is the same except for the ADD/OR thing. My theory is
> that using addition somehow utilizes the hardware instruction (the integer
> fused multiply-add one) but at least at PTX level, this is not visible.

This could be, although some MADs are visible at PTX level.  Another
guess is that ADD might actually have lower latency than OR - although
it'd be weird.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.