|
Message-Id: <FF22BA41-F731-4CE1-899A-554015A1F75D@gmail.com> Date: Wed, 10 Jun 2015 09:36:51 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Interleaving of intrinsics > On Jun 9, 2015, at 8:46 PM, Lei Zhang <zhanglei.april@...il.com> wrote: > > I tried to see the 'size' of sse-intrinsics.o under different interleaving factors and compiled by clang and icc respectively. > > lei-mac:src lei$ size clang/* > __TEXT __DATA __OBJC others dec hex > 122863 0 0 26572 149435 247bb clang/x1.o > 127951 0 0 28699 156650 263ea clang/x2.o > 128479 0 0 28614 157093 265a5 clang/x3.o > 127679 0 0 28527 156206 2622e clang/x4.o > > lei-mac:src lei$ size icc/* > __TEXT __DATA __OBJC others dec hex > 102084 7545 0 50442 160071 27147 icc/x1.o > 113012 9799 0 49375 172186 2a09a icc/x2.o > 113348 9799 0 51275 174422 2a956 icc/x3.o > 114740 9799 0 53235 177774 2b66e icc/x4.o I forgot to mention that the interleaving factor I experimented is SIMD_PARA_SHA256. The corresponding performance of pbkdf2-hmac-sha256 is: [clang] x1 Raw: 289 c/s real, 289 c/s virtual x2 Raw: 271 c/s real, 271 c/s virtual x3 Raw: 273 c/s real, 273 c/s virtual x4 Raw: 269 c/s real, 269 c/s virtual [icc] x1 Raw: 300 c/s real, 300 c/s virtual x2 Raw: 235 c/s real, 235 c/s virtual x3 Raw: 242 c/s real, 242 c/s virtual x4 Raw: 226 c/s real, 226 c/s virtual There's more noticeable degradation for icc when interleaving is increased from x1 to x2. Considering the size change, it looks icc is indeed more aggressive when unrolling. OTOH, when interleaving is increased from x2 to x4, the size of text segment doesn't change as significantly as from x1 to x2. I don't know why this happened. > interleaving loops unrolled > -------------------------------------- > x1 215 > x2 225 > x3 225 > x4 225 I think 225 - 215 = 10 corresponds to the number of unrolled SHA256_PARA_DOs, which is a bit less than the actual number of SHA256_PARA_DOs used in the source code. I manually compared the report given by icc and the source code, and confirmed that a few SHA256_PARA_DOs are indeed not unrolled. This again implies that manual unrolling may be needed. Or maybe tweaking the compiler flags can help. Lei
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.