|
Message-ID: <20130126232544.GB2363@openwall.com> Date: Sun, 27 Jan 2013 03:25:44 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Proposed optimizations to pwsafe On Sat, Jan 26, 2013 at 11:58:29PM +0100, magnum wrote: > On 26 Jan, 2013, at 23:51 , Milen Rangelov <gat3way@...il.com> wrote: > > > Hm, I guess the compiler got smarter and was able to generate the bfi_int when not explicitly doing bitselect(). This was not the case some ago and that's good news. Need to do some experiments and check the ISA generated. > > I think we should do it anyway (for now, at least). I agree. > What is much more annoying is that bitselects on nvidia hurts performance (last time I checked). That is just weird. I mean, OK if they do not have a hardware instruction but they should definitely not end up slower than using the spelled out syntax. Or is it harder than I imagine? What I saw happen when optimizing Sayantan's older mscash2-opencl code - before we moved the redundant portion of the HMAC out of the loop at source code level - was that switching from simple bitwise ops to rotate() and bitselect() precluded the compiler from optimizing out and moving out of the loop some of the redundant code. This was on HD 7970 with Catalyst 12.4 (not sure of this last detail, though). So changing to rotate() and bitselect() slowed the code down - making it up to 2x slower (when the compiler would no longer move the redundant SHA-1 computations out of the loop). However, after the code was properly optimized, changing to rotate() and bitselect() actually sped it up some more. I think something similar can be happening on NVIDIA as well, with more complicated ops being too complicated for the compiler to "see through" and to optimize nearby code. For example, the compiler is almost certainly aware of the associative and commutative properties of ADD and OR, which may allow for certain optimizations, but it might not be aware of any/enough properties of rotate() and bitselect(). This means that we ought to optimize our code fully ourselves (of course), and only then expect to see the full advantage of rotate() and bitselect(). I don't oppose introducing rotate() and bitselect() early on. On the contrary, I am for it. I merely said that it's normal not to see much/any advantage from them when the rest of code is not yet fully optimized. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.