|
Message-ID: <20150905043429.GA24746@openwall.com> Date: Sat, 5 Sep 2015 07:34:29 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: MD5 on XOP, NEON, AltiVec On Sat, Sep 05, 2015 at 07:17:49AM +0300, Solar Designer wrote: > I sort of found it: somehow the code handling SSEi_FLAT_OUT, when > compiled in, changes the stack frame layout in such a way that > performance drops. I wasn't yet able to tell why it drops. The > offsets look properly aligned to me either way. BTW, the code size with SSEi_FLAT_OUT is: $ nm -S simd-intrinsics.o | fgrep -w T [...] 0000000000000000 0000000000002daf T SIMDmd5body 0000000000002dc0 0000000000003f9f T md5cryptsse without SSEi_FLAT_OUT it becomes: 0000000000000000 0000000000002a38 T SSEmd5body 0000000000002a40 0000000000003f9f T md5cryptsse That's 27982 vs. 27095 bytes. In Bulldozer, we have 64 KiB 2-way L1i shared for two "cores" in a module. So in terms of sheer size, this should fit. It is possible that the extra 900 bytes result in overlap with something else that we use (2-way is very low), but I would expect md5crypt's 1000 iterations to take long enough for this effect to be insignificant. Thus, this doesn't appear to be it. However, we should keep in mind that our code here grew this large, and maybe support builds with less function inlining for CPUs with 16 KiB instruction caches (what CPUs are they these days? some ARM chips perhaps?) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.