|
Message-Id: <00AF348B-FA31-45EB-8517-88BA1F1034F5@gmail.com> Date: Thu, 2 Apr 2015 23:47:14 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: New SIMD generations, code layout Hi magnum, > On Apr 1, 2015, at 4:13 PM, magnum <john.magnum@...hmail.com> wrote: > > What you need to do: > 1. Fix it so rawSHA512_ng builds at all (eg. change the top "#if > __SSE2__" to something like "#if __SSE2__ || __MIC__" for a starter). > 2. Fix whatever more is needed to make it build at all. For example, > while the SWAP_ENDIAN macro is blindly added for AVS512, it's untested. > And the GATHER macro doesn't even have a section for AVX512 yet, but it > needs one. By the way, we should probably move those two macros to the > pseudo-intrinsics.h file instead. Perhaps as vswap() and vgather(). > 3. Fix whatever more is needed to make it run correctly. > 4. See if there are things that can be implemented better (faster). I fixed the MIC intrinsics used in rawSHA256_ng and rawSHA512_ng. Now they can build and pass the self-tests. rawSHA1_ng seems a bit troublesome because of the use of hardcoded lookup table. The table for AVX2 looks cumbersome enough. I can't imagine how the table for MIC looks like if defined in the same way. I tried to use bit shifts to make up the table, making it look like this: #define X ((((uint128)0xFFFFFFFFFFFFFFFF)<<64) + 0xFFFFFFFFFFFFFFFF) static const __aligned_simd uint128_t kUsedBytesTable[][4] = { {X<< 0, X<< 0, X<< 0, X<< 0}, {X<< 8, X<< 0, X<< 0, X<< 0}, {X<< 16, X<< 0, X<< 0, X<< 0}, ... } This looks more compact but still cumbersome. I don't know if there's a better way. BTW, I have a question on how the lookup table is constructed. In kUsedBytesTable, from my observation, each subarray corresponds to a SIMD vector and those vectors are consecutively shifted left by one byte in order. But in the lower middle of the table, I find a "jump" that breaks my observation: // for SSE static const __aligned_simd uint32_t kUsedBytesTable[][4] = { ... { 0x00000000, 0x00000000, 0xFF000000, 0xFFFFFFFF }, { 0x00000000, 0x00000000, 0x00000000, 0xFFFFFF00 }, ... }; The lower subarray is supposed to be shifted left by one bytes from the upper subarray, but actually it's shifted left by two bytes. I don't know if this is a mistyping or something intentionally done. Could you give me some explanation? Thanks, Lei
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.