Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120714104341.GB11924@cmpxchg8b.com>
Date: Sat, 14 Jul 2012 12:43:41 +0200
From: Tavis Ormandy <taviso@...xchg8b.com>
To: Solar Designer <solar@...nwall.com>
Cc: john-dev@...ts.openwall.com
Subject: Re: Rotate and bitselect investigation

On Fri, Jul 13, 2012 at 11:21:29AM +0400, Solar Designer wrote:
> magnum, Tavis -
> 
> On Mon, Jul 09, 2012 at 10:30:29AM +0400, Solar Designer wrote:
> > On Mon, Jul 09, 2012 at 10:15:54AM +0530, Sayantan Datta wrote:
> > > F(x,y,z) ((x & y) | (z & (x | y)))==F(x,y,z) (bitselect(x, y, z) ^
> > > bitselect(x, (uint)0, y))
> > 
> > Wow.  I wonder if this trick for SHA-1 was known at all.  Not to us, it
> > seems.  The second bitselect() is essentially an and-not, so the speed
> > might be better if it's written as such (if there's an and-not
> > instruction).  Also, I guess this change should hurt on NVIDIA (does
> > it?), so you'll need to wrap it in some #ifdef.
> > 
> > Anyway, I've just tried it on CPU (XOP).  Patch attached.  Here are the
> > speeds (best of several invocations in each case):
> > 
> > Before:
> > 
> > Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 XOP intrinsics 4x]... DONE
> > Raw:    28925K c/s real, 28925K c/s virtual
> > 
> > After:
> > 
> > Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 XOP intrinsics 4x]... DONE
> > Raw:    28435K c/s real, 28435K c/s virtual
> 
> On another build (same machine), the patched version is faster.  So I
> guess it depends on placement in caches and such.  The code becomes a
> bit smaller (9179 bytes reduces to 9115 bytes for rawSHA1_ng_fmt.o
> .text).  So I think we should apply the patch from my previous posting
> on this as-is.  There's clear speedup for sse-intrinsics.c's SHA-1.
> 
> magnum - please commit.  I think this can be in the fixes branch as well
> (trivial change in terms of possible breakage).
> 
> http://www.openwall.com/lists/john-dev/2012/07/09/13
> 

Agreed, it looks like it should be an improvement. Even if it's
negligible, reducing code size is still a win. FWIW, it seems to
get better results on intel silicon than you were seeing on amd..

Fun.

Tavis.

-- 
-------------------------------------
taviso@...xchg8b.com | pgp encrypted mail preferred
-------------------------------------------------------

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.