|
Message-Id: <4B49D192-6410-40C3-85CF-992E438D6D68@gmail.com> Date: Tue, 14 Jul 2015 09:56:33 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Interleaving of intrinsics > On Jun 23, 2015, at 2:03 AM, Solar Designer <solar@...nwall.com> wrote: > > One thing that is clear is that non-fully-unrolled *_PARA_DO are not > acceptable. If there are not enough registers for fully unrolling > these without incurring spilling, then the interleaving factor should be > smaller. On MIC, there should be enough registers for the interleaving > factors considered above (up to 5x). The only mechanism I can find to control the unrolling of a specific loop is '#pragma unroll (n)', which supposedly tell the compiler to unroll the loop by the factor of exactly n. I just tried it on MD5_PARA_DO, with icc, clang and gcc respectively. Below are the size of text segment before and after using this directive, sorted by interleaving factors. All compilers were invoked with -O2. [icc] factor before after ---------------------- x1 118220 118220 x2 132564 132268 x3 138276 146220 x4 152420 164732 [clang] factor before after ---------------------- x1 117562 117562 x2 124658 125602 x3 131042 133954 x4 136882 143170 [gcc] factor before after ---------------------- x1 124897 124897 x2 124471 124471 x3 131537 131537 x4 138291 138291 It seems icc and clang paid enough respect to this directive, but gcc somehow just ignored it. Lei
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.