|
Message-Id: <C8611EE7-F9D9-42AE-97CB-7358F7B3996A@gmail.com> Date: Mon, 6 Apr 2015 21:38:03 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: New SIMD generations, code layout > On Apr 6, 2015, at 8:33 PM, magnum <john.magnum@...hmail.com> wrote: > > It looks good. What failure do you get? Your version fails even with > AVX2, with "FAILED (cmp_all(5))". That message means keys 0..4 were set, > crypt_all(5) was called and them cmp_all(5) which did not indicate > anything was cracked. So everything worked correctly up to 4, but 5th > failed. I find the problem to be with this code snippet in sha1_fmt_cmp_all: ------------------------------------------------------------ for (i = 0; i < count; i += 64) { int32_t R = 0; #if __MIC__ || __AVX512__ R |= vtesteq_epi32(B, vload(&MD[i + 0])); R |= vtesteq_epi32(B, vload(&MD[i + 16])); R |= vtesteq_epi32(B, vload(&MD[i + 32])); R |= vtesteq_epi32(B, vload(&MD[i + 48])); #elif __AVX2__ R |= vtesteq_epi32(B, vload(&MD[i + 0])); R |= vtesteq_epi32(B, vload(&MD[i + 8])); R |= vtesteq_epi32(B, vload(&MD[i + 16])); R |= vtesteq_epi32(B, vload(&MD[i + 24])); R |= vtesteq_epi32(B, vload(&MD[i + 32])); R |= vtesteq_epi32(B, vload(&MD[i + 40])); R |= vtesteq_epi32(B, vload(&MD[i + 48])); R |= vtesteq_epi32(B, vload(&MD[i + 56])); #else R |= vtesteq_epi32(B, vload(&MD[i + 0])); R |= vtesteq_epi32(B, vload(&MD[i + 4])); R |= vtesteq_epi32(B, vload(&MD[i + 8])); R |= vtesteq_epi32(B, vload(&MD[i + 12])); R |= vtesteq_epi32(B, vload(&MD[i + 16])); R |= vtesteq_epi32(B, vload(&MD[i + 20])); R |= vtesteq_epi32(B, vload(&MD[i + 24])); R |= vtesteq_epi32(B, vload(&MD[i + 28])); R |= vtesteq_epi32(B, vload(&MD[i + 32])); R |= vtesteq_epi32(B, vload(&MD[i + 36])); R |= vtesteq_epi32(B, vload(&MD[i + 40])); R |= vtesteq_epi32(B, vload(&MD[i + 44])); R |= vtesteq_epi32(B, vload(&MD[i + 48])); R |= vtesteq_epi32(B, vload(&MD[i + 52])); R |= vtesteq_epi32(B, vload(&MD[i + 56])); R |= vtesteq_epi32(B, vload(&MD[i + 60])); #endif M |= R; } ------------------------------------------------------------ In the original code, the stride (between two vtesteq_epi32s) is fixed to 4. I think I should adjust the stride according to the SIMD width, so I modify the code as how it looks now. And as you mentioned, the new code fails even on AVX2. I just tried to revert it back to use the fixed stride of 4, and then it passed the self-test on AVX2, which is strange. I don't know why the stride isn't adjustable. And I can't try that fixed stride on MIC, because it won't guarantee the 64-byte alignment required by MIC. Any thoughts? Lei
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.