|
Message-ID: <20150903194048.GA15176@openwall.com> Date: Thu, 3 Sep 2015 22:40:48 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: SHA-1 H() On Thu, Sep 03, 2015 at 09:29:37PM +0200, magnum wrote: > On 2015-09-03 20:40, Solar Designer wrote: > >On Thu, Sep 03, 2015 at 11:52:47AM +0200, magnum wrote: > >>Apparently GCN has ANDN and NAND. > > > >I need to take a fresh look at the arch manual, but in the generated > >code I only see scalar ANDN, and never vector ANDN (nor NAND). They > >defined scalar ANDN presumably because it's so useful for exec masks. > > > >I see you've committed this: > > > >+#if cpu(DEVICE_INFO) || amd_gcn(DEVICE_INFO) > >+#define HAVE_ANDNOT 1 > >+#endif > > > >but I think the check for amd_gcn(DEVICE_INFO) is wrong. > > We currently never run vectorized on GCN anyway, unless forced by user - > if format supports it at all. That's the SIMD vs. SIMT confusion again. When talking ISA level: By scalar, I mean the tiny scalar unit that is normally used for control only. By vector, I mean the SIMD units. Per the generated assembly code, there are no ANDN and NAND instructions for the SIMD units at all. Trying to Google what their likely mnemonics would be returns no hits. I think they just don't exist. And it does not matter whether the kernel is vectorized or not. It uses those same vector instructions either way. If vectorized, it gets interleaved instructions, e.g. phpass-opecl: v_add_i32 v43, vcc, v36, v43 // 00003D78: 4A565724 v_add_i32 v44, vcc, v37, v44 // 00003D7C: 4A585925 v_add_i32 v45, vcc, v38, v45 // 00003D80: 4A5A5B26 v_add_i32 v46, vcc, v35, v46 // 00003D84: 4A5C5D23 - v_not_b32 v51, v28 // 00003D88: 7E666F1C - v_not_b32 v52, v29 // 00003D8C: 7E686F1D - v_not_b32 v53, v30 // 00003D90: 7E6A6F1E - v_not_b32 v54, v27 // 00003D94: 7E6C6F1B - v_or_b32 v51, v43, v51 // 00003D98: 3866672B - v_or_b32 v52, v44, v52 // 00003D9C: 3868692C - v_or_b32 v53, v45, v53 // 00003DA0: 386A6B2D - v_or_b32 v54, v46, v54 // 00003DA4: 386C6D2E + v_bfi_b32 v51, v28, v43, -1 // 00003D88: D2940033 0306571C + v_bfi_b32 v52, v29, v44, -1 // 00003D90: D2940034 0306591D + v_bfi_b32 v53, v30, v45, -1 // 00003D98: D2940035 03065B1E + v_bfi_b32 v54, v27, v46, -1 // 00003DA0: D2940036 03065D1B v_xor_b32 v51, v36, v51 // 00003DA8: 3A666724 v_xor_b32 v52, v37, v52 // 00003DAC: 3A686925 v_xor_b32 v53, v38, v53 // 00003DB0: 3A6A6B26 v_xor_b32 v54, v35, v54 // 00003DB4: 3A6C6D23 (This also shows the effect of my MD5_I optimization.) > But perhaps it should be (amd_gcn(DEVICE_INFO) && (V_WIDTH < 2)) then? No. > >And why this change? - > > > >-#if !gpu_nvidia(DEVICE_INFO) || nvidia_sm_5x(DEVICE_INFO) > >+#if !gpu_nvidia(DEVICE_INFO) > > #define USE_BITSELECT 1 > > #elif gpu_nvidia(DEVICE_INFO) > > #define OLD_NVIDIA 1 > > #endif > > I saw definite speedup for PBKDF2 and RAR iirc, and perhaps md5crypt. > But later I saw contradicting figures for other formats so I'm not sure > about this and things are in a state of flux. It might be that we should > revert to initially setting it (for Maxwell) in opencl_misc.h, and later > conditionally undefine it in certain formats. > > Is bitselect() expected to always generate a LOP3.LUT? Even if it is, I > figure the optimizer just might be able to do better when given > bitselect-free code. Yes, we should review the generated code. It is unclear what source code is more likely to result in optimal use of LOP3.LUT. > Besides all this, I see I introduced a bug: Now OLD_NVIDIA is defined > for Maxwell and that was not the intention. I'll fix that right away. Yes. Thanks. > >>BTW early tests indicate that 5916a57 made SHA-512 very slightly worse > >>(but almost hidden by normal variations). > > > >On what hardware? > > AVX and AVX2. My overall feeling is SHA256 got a slight boost while > SHA512 did not and sometimes the latter got a very slight regression. > But I haven't really gone systematic yet. All my tests are very > inconclusive as of yet, the fluctuations are larger than the > boosts/regressions. That's not surprising. I only expect much difference on XOP. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.