|
Message-ID: <20150903184003.GA14803@openwall.com> Date: Thu, 3 Sep 2015 21:40:03 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: SHA-1 H() On Thu, Sep 03, 2015 at 11:52:47AM +0200, magnum wrote: > On 2015-09-03 06:56, Solar Designer wrote: > >On Wed, Sep 02, 2015 at 09:31:34PM +0200, magnum wrote: > >>#define Ch(x, y, z) (z ^ (x & (y ^ z))) > >>#define Ch(x, y, z) ((x & y) ^ ( (~x) & z)) > >> > >>This is 3 vs. 4 ops, right? > > > >On archs without AND-NOT, yes. So it's a good find, and I'm happy you > >patched these. > > > >However, on archs with AND-NOT either is 3 ops, and the one with AND-NOT > >has some parallelism. > > Maybe the and-not one is better on some GPU then? I need to test. Yes, that's possible. > Apparently GCN has ANDN and NAND. I need to take a fresh look at the arch manual, but in the generated code I only see scalar ANDN, and never vector ANDN (nor NAND). They defined scalar ANDN presumably because it's so useful for exec masks. I see you've committed this: +#if cpu(DEVICE_INFO) || amd_gcn(DEVICE_INFO) +#define HAVE_ANDNOT 1 +#endif but I think the check for amd_gcn(DEVICE_INFO) is wrong. And why this change? - -#if !gpu_nvidia(DEVICE_INFO) || nvidia_sm_5x(DEVICE_INFO) +#if !gpu_nvidia(DEVICE_INFO) #define USE_BITSELECT 1 #elif gpu_nvidia(DEVICE_INFO) #define OLD_NVIDIA 1 #endif > >Maybe both forms of emulation need to be kept in pseudo_intrinsics.h > >with a way for us to choose one or the other. It might happen that the > >optimal choice will vary by arch, CPU, compiler, format. > > But if it varies by format, we need to decide outside pseudo_intrinsics.h. We could include several versions of the macro in pseudo_intrinsics.h and decide in the format via setting another macro (WANT_XXX) before including pseudo_intrinsics.h. > BTW early tests indicate that 5916a57 made SHA-512 very slightly worse > (but almost hidden by normal variations). On what hardware? The parallelism vs. register pressure tradeoff is in fact non-obviously beneficial. But on XOP there should be speedup from doing 1 op fewer. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.