|
Message-ID: <20120713072129.GA23369@openwall.com> Date: Fri, 13 Jul 2012 11:21:29 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Cc: Tavis Ormandy <taviso@...xchg8b.com> Subject: Re: Rotate and bitselect investigation magnum, Tavis - On Mon, Jul 09, 2012 at 10:30:29AM +0400, Solar Designer wrote: > On Mon, Jul 09, 2012 at 10:15:54AM +0530, Sayantan Datta wrote: > > F(x,y,z) ((x & y) | (z & (x | y)))==F(x,y,z) (bitselect(x, y, z) ^ > > bitselect(x, (uint)0, y)) > > Wow. I wonder if this trick for SHA-1 was known at all. Not to us, it > seems. The second bitselect() is essentially an and-not, so the speed > might be better if it's written as such (if there's an and-not > instruction). Also, I guess this change should hurt on NVIDIA (does > it?), so you'll need to wrap it in some #ifdef. > > Anyway, I've just tried it on CPU (XOP). Patch attached. Here are the > speeds (best of several invocations in each case): > > Before: > > Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 XOP intrinsics 4x]... DONE > Raw: 28925K c/s real, 28925K c/s virtual > > After: > > Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 XOP intrinsics 4x]... DONE > Raw: 28435K c/s real, 28435K c/s virtual On another build (same machine), the patched version is faster. So I guess it depends on placement in caches and such. The code becomes a bit smaller (9179 bytes reduces to 9115 bytes for rawSHA1_ng_fmt.o .text). So I think we should apply the patch from my previous posting on this as-is. There's clear speedup for sse-intrinsics.c's SHA-1. magnum - please commit. I think this can be in the fixes branch as well (trivial change in terms of possible breakage). http://www.openwall.com/lists/john-dev/2012/07/09/13 Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.